Re: Unrecoverable Btrfs filesystem after a single power failure?

Qu Wenruo Sat, 30 Mar 2019 05:23:14 -0700


On 2019/3/30 下午6:26, Tobiasz Karoń wrote:
> Hi!
> 
> I have a strange case of a Btrfs failure.
> 
> I'm running Manjaro, btrfs-progs 4.20


Arch based, so kernel version shouldn't be that old.
Good job on that.

> 
> # uname -a
> Linux unfa-desktop 4.19.30-1-MANJARO #1 SMP PREEMPT Tue Mar 19
> 17:49:12 UTC 2019 x86_64 GNU/Linux
> 
> I've been writing data with rsync to a 3 TB Wester-Digital MyBook via
> USB 3.0, when accidentally, I've cut power to the disk's power brick.
> 
> The result - I can't do anything with the filesystem any more.
> Whatever I try I get this:

Please verify that, you haven't run "btrfs check --repair" or "btrfs
check --init-*" on that disk.

And if you have ran before, please provide the following the kernel
message of the verify first mount failure.

If it shows something like transid mismatch, and has the *exact* same
"wanted 1530 found 1532", then we are sure it's not "btrfs check" write
operation screwed up the situation.

In fact, just exposed in this weak, btrfs-progs write operation can lead
to the exact same problem. Either a crashed btrfs check write, or a
gracefully aborted btrfs check write, can cause the problem.

If that's the case, I'm afraid you can only try to salvage the data
using btrfs-restore, and remember don't run btrfs check with --repair or
any --init-* options, until btrfs-progs is fixed and a developer is
asking for that run.


If that's not the case, then things is getting really tricky.

The offending tree blocks are from the future, which means btrfs/kernel
block layer/driver layer/hardware controller doesn't implement flush/fua
well.

From my personal investigation, btrfs follows its flush/fua sequence
pretty well.
Unless you're using nobarrier, btrfs shouldn't cause the problem.

For block layer part, I have seen some error reports on blk-mq in recent
kernels, but it shouldn't be the case, your are only 2 versions older
than upstream 4.19.32.

For device driver/hardware controller, I'm not 100% sure, as btrfs has
less flush/fua request compared to other fs, if the hdd controller is
doing some black magic to "optimize" performance for certain fs, then it
would have a more obvious impact on btrfs, but less for journal based fs
like ZFS/Xfs/Ext*.

Thanks,
Qu

> 
> # btrfs check --repair --init-csum-tree /dev/sdc
> enabling repair mode
> Creating a new CRC tree
> Opening filesystem to check...
> parent transid verify failed on 1634923266048 wanted 1530 found 1532
> parent transid verify failed on 1634923266048 wanted 1530 found 1532
> Ignoring transid failure
> Couldn't setup extent tree
> Couldn't setup device tree
> ERROR: cannot open file system
> 
> This is a backup drive, and I don't need it to recover, but I want to
> learn from this as much as possible. Power failures happen, and I want
> to know what to do when the same thing happens to my main production
> filesystem.
> 
> You can read more detail here:
> https://unix.stackexchange.com/questions/509565/unrecoverable-btrfs-filesystem-after-a-power-failure
> 
> And much more detail here:
> https://forum.manjaro.org/t/unfixable-btrfs-filesystem-after-a-power-failure/80994
> 
> I am writing here as I'm completely out of ideas at this point, and it
> seems like this is a very rare case.
> 
> I was very happy to switch to Btrfs from ZFS, but now I am not so
> comfortable using it, if a single power failure can completely trash a
> filesystem beyond all repair.
> 
> Or can it?
>

signature.asc
Description: OpenPGP digital signature

Re: Unrecoverable Btrfs filesystem after a single power failure?

Reply via email to