Re: Unrecoverable Btrfs filesystem after a single power failure?

Qu Wenruo Sat, 30 Mar 2019 06:17:35 -0700


On 2019/3/30 下午9:12, Tobiasz Karoń wrote:
> I am afraid I probably run btrfs --check on that filesystem after I was
> unable to mount it.


Then the question is, if that run failed just like the output.

If that's the case, it means btrfs check wasn't even able to open the fs
not to mention to write anything. And all my previous statement still
stands.

> 
> I would be happy to know that this issue was caused by user error, as
> it's easier for me to deal with than a software error.
> 
> This filesystem was a backup, and I have a copy of all that.
> 
> I can simply recreate that filesystem, that's not an issue, but I wanted
> to make sure I can avoid data loss in case that happens again, and I if
> not - maybe this can help fix a software problem in Btrfs.

I believe in next btrfs-progs release, possible 5.0, you can be
more-or-less safer to run "btrfs check --repair", at least it shouldn't
crash your fs as serious as current btrfs check does.

> 
> It sounds pretty nasty, that running btrfs --check --repair can trash a
> filesystem to be honest!

That's why it's still mark dangerous.

Although in the past, we just think neh, it's just btrfs check isn't as
clever.
But now, I think we really know why btrfs check --repair is so dangerous.

Thanks,
Qu

> 
> Let me know if I can provide any more information. If not - I'm going to
> wipe that filesystem.
> 
> On Sat, Mar 30, 2019, 13:22 Qu Wenruo <quwenruo.bt...@gmx.com
> <mailto:quwenruo.bt...@gmx.com>> wrote:
> 
> 
> 
>     On 2019/3/30 下午6:26, Tobiasz Karoń wrote:
>     > Hi!
>     >
>     > I have a strange case of a Btrfs failure.
>     >
>     > I'm running Manjaro, btrfs-progs 4.20
> 
>     Arch based, so kernel version shouldn't be that old.
>     Good job on that.
> 
>     >
>     > # uname -a
>     > Linux unfa-desktop 4.19.30-1-MANJARO #1 SMP PREEMPT Tue Mar 19
>     > 17:49:12 UTC 2019 x86_64 GNU/Linux
>     >
>     > I've been writing data with rsync to a 3 TB Wester-Digital MyBook via
>     > USB 3.0, when accidentally, I've cut power to the disk's power brick.
>     >
>     > The result - I can't do anything with the filesystem any more.
>     > Whatever I try I get this:
> 
>     Please verify that, you haven't run "btrfs check --repair" or "btrfs
>     check --init-*" on that disk.
> 
>     And if you have ran before, please provide the following the kernel
>     message of the verify first mount failure.
> 
>     If it shows something like transid mismatch, and has the *exact* same
>     "wanted 1530 found 1532", then we are sure it's not "btrfs check" write
>     operation screwed up the situation.
> 
>     In fact, just exposed in this weak, btrfs-progs write operation can lead
>     to the exact same problem. Either a crashed btrfs check write, or a
>     gracefully aborted btrfs check write, can cause the problem.
> 
>     If that's the case, I'm afraid you can only try to salvage the data
>     using btrfs-restore, and remember don't run btrfs check with --repair or
>     any --init-* options, until btrfs-progs is fixed and a developer is
>     asking for that run.
> 
> 
>     If that's not the case, then things is getting really tricky.
> 
>     The offending tree blocks are from the future, which means btrfs/kernel
>     block layer/driver layer/hardware controller doesn't implement flush/fua
>     well.
> 
>     From my personal investigation, btrfs follows its flush/fua sequence
>     pretty well.
>     Unless you're using nobarrier, btrfs shouldn't cause the problem.
> 
>     For block layer part, I have seen some error reports on blk-mq in recent
>     kernels, but it shouldn't be the case, your are only 2 versions older
>     than upstream 4.19.32.
> 
>     For device driver/hardware controller, I'm not 100% sure, as btrfs has
>     less flush/fua request compared to other fs, if the hdd controller is
>     doing some black magic to "optimize" performance for certain fs, then it
>     would have a more obvious impact on btrfs, but less for journal based fs
>     like ZFS/Xfs/Ext*.
> 
>     Thanks,
>     Qu
> 
>     >
>     > # btrfs check --repair --init-csum-tree /dev/sdc
>     > enabling repair mode
>     > Creating a new CRC tree
>     > Opening filesystem to check...
>     > parent transid verify failed on 1634923266048 wanted 1530 found 1532
>     > parent transid verify failed on 1634923266048 wanted 1530 found 1532
>     > Ignoring transid failure
>     > Couldn't setup extent tree
>     > Couldn't setup device tree
>     > ERROR: cannot open file system
>     >
>     > This is a backup drive, and I don't need it to recover, but I want to
>     > learn from this as much as possible. Power failures happen, and I want
>     > to know what to do when the same thing happens to my main production
>     > filesystem.
>     >
>     > You can read more detail here:
>     >
>     
> https://unix.stackexchange.com/questions/509565/unrecoverable-btrfs-filesystem-after-a-power-failure
>     >
>     > And much more detail here:
>     >
>     
> https://forum.manjaro.org/t/unfixable-btrfs-filesystem-after-a-power-failure/80994
>     >
>     > I am writing here as I'm completely out of ideas at this point, and it
>     > seems like this is a very rare case.
>     >
>     > I was very happy to switch to Btrfs from ZFS, but now I am not so
>     > comfortable using it, if a single power failure can completely trash a
>     > filesystem beyond all repair.
>     >
>     > Or can it?
>     >
>

signature.asc
Description: OpenPGP digital signature

Re: Unrecoverable Btrfs filesystem after a single power failure?

Reply via email to