On 2019/3/30 下午6:26, Tobiasz Karoń wrote: > Hi! > > I have a strange case of a Btrfs failure. > > I'm running Manjaro, btrfs-progs 4.20
Arch based, so kernel version shouldn't be that old. Good job on that. > > # uname -a > Linux unfa-desktop 4.19.30-1-MANJARO #1 SMP PREEMPT Tue Mar 19 > 17:49:12 UTC 2019 x86_64 GNU/Linux > > I've been writing data with rsync to a 3 TB Wester-Digital MyBook via > USB 3.0, when accidentally, I've cut power to the disk's power brick. > > The result - I can't do anything with the filesystem any more. > Whatever I try I get this: Please verify that, you haven't run "btrfs check --repair" or "btrfs check --init-*" on that disk. And if you have ran before, please provide the following the kernel message of the verify first mount failure. If it shows something like transid mismatch, and has the *exact* same "wanted 1530 found 1532", then we are sure it's not "btrfs check" write operation screwed up the situation. In fact, just exposed in this weak, btrfs-progs write operation can lead to the exact same problem. Either a crashed btrfs check write, or a gracefully aborted btrfs check write, can cause the problem. If that's the case, I'm afraid you can only try to salvage the data using btrfs-restore, and remember don't run btrfs check with --repair or any --init-* options, until btrfs-progs is fixed and a developer is asking for that run. If that's not the case, then things is getting really tricky. The offending tree blocks are from the future, which means btrfs/kernel block layer/driver layer/hardware controller doesn't implement flush/fua well. From my personal investigation, btrfs follows its flush/fua sequence pretty well. Unless you're using nobarrier, btrfs shouldn't cause the problem. For block layer part, I have seen some error reports on blk-mq in recent kernels, but it shouldn't be the case, your are only 2 versions older than upstream 4.19.32. For device driver/hardware controller, I'm not 100% sure, as btrfs has less flush/fua request compared to other fs, if the hdd controller is doing some black magic to "optimize" performance for certain fs, then it would have a more obvious impact on btrfs, but less for journal based fs like ZFS/Xfs/Ext*. Thanks, Qu > > # btrfs check --repair --init-csum-tree /dev/sdc > enabling repair mode > Creating a new CRC tree > Opening filesystem to check... > parent transid verify failed on 1634923266048 wanted 1530 found 1532 > parent transid verify failed on 1634923266048 wanted 1530 found 1532 > Ignoring transid failure > Couldn't setup extent tree > Couldn't setup device tree > ERROR: cannot open file system > > This is a backup drive, and I don't need it to recover, but I want to > learn from this as much as possible. Power failures happen, and I want > to know what to do when the same thing happens to my main production > filesystem. > > You can read more detail here: > https://unix.stackexchange.com/questions/509565/unrecoverable-btrfs-filesystem-after-a-power-failure > > And much more detail here: > https://forum.manjaro.org/t/unfixable-btrfs-filesystem-after-a-power-failure/80994 > > I am writing here as I'm completely out of ideas at this point, and it > seems like this is a very rare case. > > I was very happy to switch to Btrfs from ZFS, but now I am not so > comfortable using it, if a single power failure can completely trash a > filesystem beyond all repair. > > Or can it? >
signature.asc
Description: OpenPGP digital signature