Marc MERLIN posted on Fri, 22 Aug 2014 20:10:55 -0700 as excerpted: > On Sat, Aug 23, 2014 at 02:52:16AM +0000, Duncan wrote: >> > For mysql, I got: >> > InnoDB: Page directory corruption: >> > infimum not pointed to 140708 11:53:58 InnoDB: Page dump in ascii and >> > hex (16384 bytes): >> > len 16384; hex 00000000(16KB of 0's). >> >> Is that on ssd or spinning rust, and if ssd, do you run with >> trim/discard and/or have you filled the device yet if not (since >> mkfs.btrfs trims the device as part of the process)? I'm wondering if >> that's 4 4 KiB btrfs data blocks of trimmed and unwritten SSD? > > It's on SSD, I do have trim/discard, I never filled the device. > > But I could totally remove trim and see what happens. I'll do that.
You'd probably have to mostly fill up the device with garbage and then delete it (with discard off) before it's return anything but already trimmed/zeroed blocks, since if I'm correct it's pre-allocating a bunch but then not actually writing it before the crash. It'd be pre- allocating from what it thought was free space, so as long as most of that free space hasn't been written at all since it was trimmed, you'd still likely get zeros even after turning trim off. Only after you have written something to that space and then deleted it, would the chance of coming up "dirty" increase dramatically. That is of course assuming the pre-allocation doesn't pre-zero as well, which it might. It just struck me that with trim on, a bunch of zero-blocks is what you'd expect from free-space, which is what a COW filesystem would be allocating from when there's a write into a database file like that (assuming it's not set NOCOW). On spinning rust or without trim/discard set, unzeroed garbage would accumulate in the free space over time, and a full 16 KiB of zeros would be far more interesting, as that would mean something's actually zeroing it but that mysql isn't getting data written back to it after the zeroing, before the crash. Of course that begs the question of whether it was a normal COW file or if you had it NOCOW. Setting it NOCOW (of course doing the correct set the directory NOCOW, copy the file into it dance, so it's NOCOW from the beginning) could be interesting too, and may in fact actually eliminate the problem depending on how mysql handles such things. Presumably it has some sort of database resiliency scheme as most filesystems don't do the checksumming that btrfs does so it can't rely on that, and my argument has always been that in some cases it might actually be better to let the database handle it how it normally does with ordinary filesystems and not try to get in the way, which is what NOCOW basically does, tell btrfs to let the application handle that file and not to interfere. It'd be interesting to see how well my hypothesis holds up, anyway. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html