On Wed, Feb 24, 2021 at 10:44 AM Josef Bacik <jo...@toxicpanda.com> wrote:
>
> On 2/24/21 9:23 AM, Neal Gompa wrote:
> > On Tue, Feb 23, 2021 at 10:05 AM Josef Bacik <jo...@toxicpanda.com> wrote:
> >>
> >> On 2/22/21 11:03 PM, Neal Gompa wrote:
> >>> On Mon, Feb 22, 2021 at 2:34 PM Josef Bacik <jo...@toxicpanda.com> wrote:
> >>>>
> >>>> On 2/21/21 1:27 PM, Neal Gompa wrote:
> >>>>> On Wed, Feb 17, 2021 at 11:44 AM Josef Bacik <jo...@toxicpanda.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>> On 2/17/21 11:29 AM, Neal Gompa wrote:
> >>>>>>> On Wed, Feb 17, 2021 at 9:59 AM Josef Bacik <jo...@toxicpanda.com> 
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> On 2/17/21 9:50 AM, Neal Gompa wrote:
> >>>>>>>>> On Wed, Feb 17, 2021 at 9:36 AM Josef Bacik <jo...@toxicpanda.com> 
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> On 2/16/21 9:05 PM, Neal Gompa wrote:
> >>>>>>>>>>> On Tue, Feb 16, 2021 at 4:24 PM Josef Bacik 
> >>>>>>>>>>> <jo...@toxicpanda.com> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2/16/21 3:29 PM, Neal Gompa wrote:
> >>>>>>>>>>>>> On Tue, Feb 16, 2021 at 1:11 PM Josef Bacik 
> >>>>>>>>>>>>> <jo...@toxicpanda.com> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2/16/21 11:27 AM, Neal Gompa wrote:
> >>>>>>>>>>>>>>> On Tue, Feb 16, 2021 at 10:19 AM Josef Bacik 
> >>>>>>>>>>>>>>> <jo...@toxicpanda.com> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 2/14/21 3:25 PM, Neal Gompa wrote:
> >>>>>>>>>>>>>>>>> Hey all,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> So one of my main computers recently had a disk controller 
> >>>>>>>>>>>>>>>>> failure
> >>>>>>>>>>>>>>>>> that caused my machine to freeze. After rebooting, Btrfs 
> >>>>>>>>>>>>>>>>> refuses to
> >>>>>>>>>>>>>>>>> mount. I tried to do a mount and the following errors show 
> >>>>>>>>>>>>>>>>> up in the
> >>>>>>>>>>>>>>>>> journal:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device 
> >>>>>>>>>>>>>>>>>> sda3): disk space caching is enabled
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS info (device 
> >>>>>>>>>>>>>>>>>> sda3): has skinny extents
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical 
> >>>>>>>>>>>>>>>>>> (device sda3): corrupt leaf: root=401 block=796082176 
> >>>>>>>>>>>>>>>>>> slot=15 ino=203657, invalid inode transid: has 888896 
> >>>>>>>>>>>>>>>>>> expect [0, 888895]
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption 
> >>>>>>>>>>>>>>>>>> detected
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS critical 
> >>>>>>>>>>>>>>>>>> (device sda3): corrupt leaf: root=401 block=796082176 
> >>>>>>>>>>>>>>>>>> slot=15 ino=203657, invalid inode transid: has 888896 
> >>>>>>>>>>>>>>>>>> expect [0, 888895]
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>>>>>> sda3): block=796082176 read time tree block corruption 
> >>>>>>>>>>>>>>>>>> detected
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS warning 
> >>>>>>>>>>>>>>>>>> (device sda3): couldn't read tree root
> >>>>>>>>>>>>>>>>>> Feb 14 15:20:49 localhost-live kernel: BTRFS error (device 
> >>>>>>>>>>>>>>>>>> sda3): open_ctree failed
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I've tried to do -o recovery,ro mount and get the same 
> >>>>>>>>>>>>>>>>> issue. I can't
> >>>>>>>>>>>>>>>>> seem to find any reasonably good information on how to do 
> >>>>>>>>>>>>>>>>> recovery in
> >>>>>>>>>>>>>>>>> this scenario, even to just recover enough to copy data off.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I'm on Fedora 33, the system was on Linux kernel version 
> >>>>>>>>>>>>>>>>> 5.9.16 and
> >>>>>>>>>>>>>>>>> the Fedora 33 live ISO I'm using has Linux kernel version 
> >>>>>>>>>>>>>>>>> 5.10.14. I'm
> >>>>>>>>>>>>>>>>> using btrfs-progs v5.10.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Can anyone help?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Can you try
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> btrfs check --clear-space-cache v1 /dev/whatever
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> That should fix the inode generation thing so it's sane, and 
> >>>>>>>>>>>>>>>> then the tree
> >>>>>>>>>>>>>>>> checker will allow the fs to be read, hopefully.  If not we 
> >>>>>>>>>>>>>>>> can work out some
> >>>>>>>>>>>>>>>> other magic.  Thanks,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Josef
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I got the same error as I did with btrfs-check --readonly...
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Oh lovely, what does btrfs check --readonly --backup do?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> No dice...
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> # btrfs check --readonly --backup /dev/sda3
> >>>>>>>>>>>>>> Opening filesystem to check...
> >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>>>>>> 888895
> >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>>>>>> 888895
> >>>>>>>>>>>>>> parent transid verify failed on 791281664 wanted 888893 found 
> >>>>>>>>>>>>>> 888895
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hey look the block we're looking for, I wrote you some magic, 
> >>>>>>>>>>>> just pull
> >>>>>>>>>>>>
> >>>>>>>>>>>> https://github.com/josefbacik/btrfs-progs/tree/for-neal
> >>>>>>>>>>>>
> >>>>>>>>>>>> build, and then run
> >>>>>>>>>>>>
> >>>>>>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895
> >>>>>>>>>>>>
> >>>>>>>>>>>> This will force us to point at the old root with (hopefully) the 
> >>>>>>>>>>>> right bytenr
> >>>>>>>>>>>> and gen, and then hopefully you'll be able to recover from 
> >>>>>>>>>>>> there.  This is kind
> >>>>>>>>>>>> of saucy, so yolo, but I can undo it if it makes things worse.  
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> # btrfs check --readonly /dev/sda3
> >>>>>>>>>>>> Opening filesystem to check...
> >>>>>>>>>>>> ERROR: could not setup extent tree
> >>>>>>>>>>>> ERROR: cannot open file system
> >>>>>>>>>>> # btrfs check --clear-space-cache v1 /dev/sda3
> >>>>>>>>>>>> Opening filesystem to check...
> >>>>>>>>>>>> ERROR: could not setup extent tree
> >>>>>>>>>>>> ERROR: cannot open file system
> >>>>>>>>>>>
> >>>>>>>>>>> It's better, but still no dice... :(
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hmm it's not telling us what's wrong with the extent tree, which 
> >>>>>>>>>> is annoying.
> >>>>>>>>>> Does mount -o rescue=all,ro work now that the root tree is normal? 
> >>>>>>>>>>  Thanks,
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Nope, I see this in the journal:
> >>>>>>>>>
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> enabling all of the rescue options
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> ignoring data csums
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> ignoring bad roots
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> disabling log replay at mount time
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> disk space caching is enabled
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS info (device sda3): 
> >>>>>>>>>> has skinny extents
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): 
> >>>>>>>>>> tree level mismatch detected, bytenr=791281664 level expected=1 
> >>>>>>>>>> has=2
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): 
> >>>>>>>>>> tree level mismatch detected, bytenr=791281664 level expected=1 
> >>>>>>>>>> has=2
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS warning (device 
> >>>>>>>>>> sda3): couldn't read tree root
> >>>>>>>>>> Feb 17 09:49:40 localhost-live kernel: BTRFS error (device sda3): 
> >>>>>>>>>> open_ctree failed
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Ok git pull for-neal, rebuild, then run
> >>>>>>>>
> >>>>>>>> btrfs-neal-magic /dev/sda3 791281664 888895 2
> >>>>>>>>
> >>>>>>>> I thought of this yesterday but in my head was like "naaahhhh, whats 
> >>>>>>>> the chances
> >>>>>>>> that the level doesn't match??".  Thanks,
> >>>>>>>>
> >>>>>>>
> >>>>>>> Tried rescue mount again after running that and got a stack trace in
> >>>>>>> the kernel, detailed in the following attached log.
> >>>>>>
> >>>>>> Huh I wonder how I didn't hit this when testing, I must have only 
> >>>>>> tested with
> >>>>>> zero'ing the extent root and the csum root.  You're going to have to 
> >>>>>> build a
> >>>>>> kernel with a fix for this
> >>>>>>
> >>>>>> https://paste.centos.org/view/7b48aaea
> >>>>>>
> >>>>>> and see if that gets you further.  Thanks,
> >>>>>>
> >>>>>
> >>>>> I built a kernel build as an RPM with your patch[1] and tried it.
> >>>>>
> >>>>> [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt
> >>>>> Killed
> >>>>>
> >>>>> The log from the journal is attached.
> >>>>
> >>>>
> >>>> Ahh crud my bad, this should do it
> >>>>
> >>>> https://paste.centos.org/view/ac2e61ef
> >>>>
> >>>
> >>> Patch doesn't apply (note it is patch 667 below):
> >>
> >> Ah sorry, should have just sent you an iterative patch.  You can take the 
> >> above
> >> patch and just delete the hunk from volumes.c as you already have that 
> >> applied
> >> and then it'll work.  Thanks,
> >>
> >
> > Failed with a weird error...?
> >
> > [root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sda3 /mnt
> > mount: /mnt: mount(2) system call failed: No such file or directory.
> >
> > Journal log with traceback attached.
>
> Last one maybe?
>
> https://paste.centos.org/view/80edd6fd
>

Similar weird failure:

[root@fedora ~]# mount -t btrfs -o rescue=all,ro /dev/sdb3 /mnt
mount: /mnt: mount(2) system call failed: No such file or directory.

No crash in the journal this time, though:

> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): enabling all of the 
> rescue options
> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring data csums
> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): ignoring bad roots
> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disabling log replay 
> at mount time
> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): disk space caching 
> is enabled
> Feb 24 22:43:19 fedora kernel: BTRFS info (device sdb3): has skinny extents
> Feb 24 22:43:19 fedora kernel: BTRFS warning (device sdb3): failed to read fs 
> tree: -2
> Feb 24 22:43:19 fedora kernel: BTRFS error (device sdb3): open_ctree failed




-- 
真実はいつも一つ!/ Always, there's only one truth!

Reply via email to