On Mon, Mar 22, 2021 at 3:49 PM Chris Murphy <li...@colorremedies.com> wrote:
>
> On Mon, Mar 22, 2021 at 12:32 AM Dave T <davestechs...@gmail.com> wrote:
> >
> > On Sun, Mar 21, 2021 at 2:03 PM Chris Murphy <li...@colorremedies.com> 
> > wrote:
> > >
> > > On Sat, Mar 20, 2021 at 11:54 PM Dave T <davestechs...@gmail.com> wrote:
> > > >
> > > > # btrfs check -r 2853787942912 /dev/mapper/xyz
> > > > Opening filesystem to check...
> > > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > > parent transid verify failed on 2853787942912 wanted 29436 found 29433
> > > > Ignoring transid failure
> > > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > > parent transid verify failed on 2853827723264 wanted 29433 found 29435
> > > > Ignoring transid failure
> > > > leaf parent key incorrect 2853827723264
> > > > ERROR: could not setup extent tree
> > > > ERROR: cannot open file system
> > >
> > > btrfs insp dump-t -t 2853827723264 /dev/
> >
> > # btrfs insp dump-t -t 2853827723264 /dev/mapper/xzy
> > btrfs-progs v5.11
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > WARNING: could not setup extent tree, skipping it
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > Couldn't setup device tree
> > ERROR: unable to open /dev/mapper/xzy
> >
> > # btrfs insp dump-t -t 2853787942912 /dev/mapper/xzy
> > btrfs-progs v5.11
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > WARNING: could not setup extent tree, skipping it
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > Couldn't setup device tree
> > ERROR: unable to open /dev/mapper/xzy
> >
> > # btrfs insp dump-t -t 2853827608576 /dev/mapper/xzy
> > btrfs-progs v5.11
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > WARNING: could not setup extent tree, skipping it
> > parent transid verify failed on 2853827608576 wanted 29436 found 29433
> > Ignoring transid failure
> > leaf parent key incorrect 2853827608576
> > Couldn't setup device tree
> > ERROR: unable to open /dev/mapper/xzy
>
> That does not look promising. I don't know whether a read-write mount
> with usebackuproot will recover, or end up with problems.
>
> Options:
>
> a. btrfs check --repair
> This probably fails on the same problem, it can't setup the extent tree.
>
> b. btrfs check --init-extent-tree
> This is a heavy hammer, it might succeed, but takes a long time. On 5T
> it might take double digit hours or even single digit days. It's
> generally faster to just wipe the drive and restore from backups than
> use init-extent-tree (I understand this *is* your backup).
>
> c. Setup an overlay file on device mapper, to redirect the writes from
> a read-write mount with usebackup root. I think it's sufficient to
> just mount, optionally write some files (empty or not), and umount.
> Then do a btrfs check to see if the current tree is healthy.
> https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file
>
> That guide is a bit complex to deal with many drives with mdadm raid,
> so you can simplify it for just one drive. The gist is no writes go to
> the drive itself, it's treated as read-only by device-mapper (in fact
> you can optionally add a pre-step with the blockdev command and
> --setro to make sure the entire drive is read-only; just make sure to
> make it rw once you're done testing). All the writes with this overlay
> go into a loop mounted file which you intentionally just throw away
> after testing.
>
> d. Just skip the testing and try usebackuproot with a read-write
> mount. It might make things worse, but at least it's fast to test. If
> it messes things up, you'll have to recreate this backup from scratch.

I took this approach. My command was simply:

    mount -o usebackuproot /dev/mapper/xzy /backup

It appears to have succeeded because it mounted without errors. I
completed a new incremental backup (with btrbk) and it finished
without errors.
I'll be pleased if my backup history is preserved, as appears to be the case.

I will run some checks on those backup subvolumes tomorrow. Are there
specific checks you would recommend?

>
> As for how to prevent this? I'm not sure. About the best we can do is
> disable the drive write cache with a udev rule,

That sounds like a suitable solution for me.

Thank you for this information. BTW, I have been using BTRFS for many
years. This is the first serious issue I have had, and as you said
there is a large element of user error and bad luck involved in this
case.

> and/or raid1 with
> another make/model drive, and let Btrfs detect occasional corruption
> and self heal from the good copy. Another obvious way to avoid the
> problem is, stop having power failures, crashes, and accidental USB
> cable disconnections :)
>
> It's not any one thing that's the problem. It's a sequence of problems
> happening in just the right (or wrong) order that causes the problem.
> Bugs + mistake + bad luck = problem.
>
> --
> Chris Murphy

Reply via email to