On Wed, Feb 24, 2021 at 10:44 AM Josef Bacik <jo...@toxicpanda.com> wrote: > > On 2/24/21 9:23 AM, Neal Gompa wrote: > > On Tue, Feb 23, 2021 at 10:05 AM Josef Bacik <jo...@toxicpanda.com> wrote: > >> > >> On 2/22/21 11:03 PM, Neal Gompa wrote: > >>> On Mon, Feb 22, 2021 at 2:34 PM Josef Bacik <jo...@toxicpanda.com> wrote: > >>>> > >>>> On 2/21/21 1:27 PM, Neal Gompa wrote: > >>>>> On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <jo...@toxicpanda.com> > >>>>> wrote: > >>>>>> > >>>>>> On 2/17/21 11:29 AM, Neal Gompa wrote: > >>>>>>> On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <jo...@toxicpanda.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> On 2/17/21 9:50 AM, Neal Gompa wrote: > >>>>>>>>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <jo...@toxicpanda.com> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> On 2/16/21 9:05 PM, Neal Gompa wrote: > >>>>>>>>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik > >>>>>>>>>>> <jo...@toxicpanda.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote: > >>>>>>>>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik > >>>>>>>>>>>>> <jo...@toxicpanda.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote: > >>>>>>>>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik > >>>>>>>>>>>>>>> <jo...@toxicpanda.com> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote: > >>>>>>>>>>>>>>>>> Hey all, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> So one of my main computers recently had a disk controller > >>>>>>>>>>>>>>>>> failure > >>>>>>>>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs > >>>>>>>>>>>>>>>>> refuses to > >>>>>>>>>>>>>>>>> mount. I tried to do a mount and the following errors show > >>>>>>>>>>>>>>>>> up in the > >>>>>>>>>>>>>>>>> journal: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device > >>>>>>>>>>>>>>>>>> sda3): disk space caching is enabled > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device > >>>>>>>>>>>>>>>>>> sda3): has skinny extents > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical > >>>>>>>>>>>>>>>>>> (device sda3): corrupt leaf: root=401 block=796082176 > >>>>>>>>>>>>>>>>>> slot=15 ino=203657, invalid inode transid: has 888896 > >>>>>>>>>>>>>>>>>> expect [0, 888895] > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption > >>>>>>>>>>>>>>>>>> detected > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical > >>>>>>>>>>>>>>>>>> (device sda3): corrupt leaf: root=401 block=796082176 > >>>>>>>>>>>>>>>>>> slot=15 ino=203657, invalid inode transid: has 888896 > >>>>>>>>>>>>>>>>>> expect [0, 888895] > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption > >>>>>>>>>>>>>>>>>> detected > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning > >>>>>>>>>>>>>>>>>> (device sda3): couldn't read tree root > >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device > >>>>>>>>>>>>>>>>>> sda3): open_ctree failed > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same > >>>>>>>>>>>>>>>>> issue. I can't > >>>>>>>>>>>>>>>>> seem to find any reasonably good information on how to do > >>>>>>>>>>>>>>>>> recovery in > >>>>>>>>>>>>>>>>> this scenario, even to just recover enough to copy data off. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version > >>>>>>>>>>>>>>>>> 5.9.16 and > >>>>>>>>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version > >>>>>>>>>>>>>>>>> 5.10.14. I'm > >>>>>>>>>>>>>>>>> using btrfs-progs v5.10. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Can anyone help? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Can you try > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> That should fix the inode generation thing so it's sane, and > >>>>>>>>>>>>>>>> then the tree > >>>>>>>>>>>>>>>> checker will allow the fs to be read, hopefully. If not we > >>>>>>>>>>>>>>>> can work out some > >>>>>>>>>>>>>>>> other magic. Thanks, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Josef > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I got the same error as I did with btrfs-check --readonly... > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Oh lovely, what does btrfs check --readonly --backup do? > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> No dice... > >>>>>>>>>>>>> > >>>>>>>>>>>>> # btrfs check --readonly --backup /dev/sda3 > >>>>>>>>>>>>>> Opening filesystem to check... > >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>>>>>> 888895 > >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>>>>>> 888895 > >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found > >>>>>>>>>>>>>> 888895 > >>>>>>>>>>>> > >>>>>>>>>>>> Hey look the block we're looking for, I wrote you some magic, > >>>>>>>>>>>> just pull > >>>>>>>>>>>> > >>>>>>>>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal > >>>>>>>>>>>> > >>>>>>>>>>>> build, and then run > >>>>>>>>>>>> > >>>>>>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 > >>>>>>>>>>>> > >>>>>>>>>>>> This will force us to point at the old root with (hopefully) the > >>>>>>>>>>>> right bytenr > >>>>>>>>>>>> and gen, and then hopefully you'll be able to recover from > >>>>>>>>>>>> there. This is kind > >>>>>>>>>>>> of saucy, so yolo, but I can undo it if it makes things worse. > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> # btrfs check --readonly /dev/sda3 > >>>>>>>>>>>> Opening filesystem to check... > >>>>>>>>>>>> ERROR: could not setup extent tree > >>>>>>>>>>>> ERROR: cannot open file system > >>>>>>>>>>> # btrfs check --clear-space-cache v1 /dev/sda3 > >>>>>>>>>>>> Opening filesystem to check... > >>>>>>>>>>>> ERROR: could not setup extent tree > >>>>>>>>>>>> ERROR: cannot open file system > >>>>>>>>>>> > >>>>>>>>>>> It's better, but still no dice... :( > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Hmm it's not telling us what's wrong with the extent tree, which > >>>>>>>>>> is annoying. > >>>>>>>>>> Does mount -o rescue=all,ro work now that the root tree is normal? > >>>>>>>>>> Thanks, > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> Nope, I see this in the journal: > >>>>>>>>> > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> enabling all of the rescue options > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> ignoring data csums > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> ignoring bad roots > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> disabling log replay at mount time > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> disk space caching is enabled > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): > >>>>>>>>>> has skinny extents > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): > >>>>>>>>>> tree level mismatch detected, bytenr=791281664 level expected=1 > >>>>>>>>>> has=2 > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): > >>>>>>>>>> tree level mismatch detected, bytenr=791281664 level expected=1 > >>>>>>>>>> has=2 > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device > >>>>>>>>>> sda3): couldn't read tree root > >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): > >>>>>>>>>> open_ctree failed > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> Ok git pull for-neal, rebuild, then run > >>>>>>>> > >>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 2 > >>>>>>>> > >>>>>>>> I thought of this yesterday but in my head was like "naaahhhh, whats > >>>>>>>> the chances > >>>>>>>> that the level doesn't match??". Thanks, > >>>>>>>> > >>>>>>> > >>>>>>> Tried rescue mount again after running that and got a stack trace in > >>>>>>> the kernel, detailed in the following attached log. > >>>>>> > >>>>>> Huh I wonder how I didn't hit this when testing, I must have only > >>>>>> tested with > >>>>>> zero'ing the extent root and the csum root. You're going to have to > >>>>>> build a > >>>>>> kernel with a fix for this > >>>>>> > >>>>>> https://paste.centos.org/view/7b48aaea > >>>>>> > >>>>>> and see if that gets you further. Thanks, > >>>>>> > >>>>> > >>>>> I built a kernel build as an RPM with your patch[1] and tried it. > >>>>> > >>>>> [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt > >>>>> Killed > >>>>> > >>>>> The log from the journal is attached. > >>>> > >>>> > >>>> Ahh crud my bad, this should do it > >>>> > >>>> https://paste.centos.org/view/ac2e61ef > >>>> > >>> > >>> Patch doesn't apply (note it is patch 667 below): > >> > >> Ah sorry, should have just sent you an iterative patch. You can take the > >> above > >> patch and just delete the hunk from volumes.c as you already have that > >> applied > >> and then it'll work. Thanks, > >> > > > > Failed with a weird error...? > > > > [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sda3 /mnt > > mount: /mnt: mount(2) system call failed: No such file or directory. > > > > Journal log with traceback attached. > > Last one maybe? > > https://paste.centos.org/view/80edd6fd >
Similar weird failure: [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt mount: /mnt: mount(2) system call failed: No such file or directory. No crash in the journal this time, though: > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): enabling all of the > rescue options > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring data csums > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring bad roots > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disabling log replay > at mount time > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disk space caching > is enabled > Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): has skinny extents > Feb 24 22:43:19 fedora kernel: BTRFS warning (device sdb3): failed to read fs > tree: -2 > Feb 24 22:43:19 fedora kernel: BTRFS error (device sdb3): open_ctree failed -- 真実はいつも一つ!/ Always, there's only one truth!