I have some additional info.

I found the reason the FS got corrupted. It was a single failing drive,
which caused the entire cabinet (containing 7 drives) to reset. So the
FS suddenly lost 7 drives.

I have removed the failed drive, so the RAID is now degraded. I hope
the data is still recoverable... ☹

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Sun, 2018-12-02 at 10:03 +0100, Patrick Dijkgraaf wrote:
> Hi Qu,
> 
> Thanks for helping me!
> 
> Please see the reponses in-line.
> Any suggestions based on this?
> 
> Thanks!
> 
> 
> On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote:
> > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote:
> > > Hi all,
> > > 
> > > I have been a happy BTRFS user for quite some time. But now I'm
> > > facing
> > > a potential ~45TB dataloss... :-(
> > > I hope someone can help!
> > > 
> > > I have Server A and Server B. Both having a 20-devices BTRFS
> > > RAID6
> > > filesystem. Because of known RAID5/6 risks, Server B was a backup
> > > of
> > > Server A.
> > > After applying updates to server B and reboot, the FS would not
> > > mount
> > > anymore. Because it was "just" a backup. I decided to recreate
> > > the
> > > FS
> > > and perform a new backup. Later, I discovered that the FS was not
> > > broken, but I faced this issue: 
> > > https://patchwork.kernel.org/patch/10694997/
> > > 
> > > 
> > 
> > Sorry for the inconvenience.
> > 
> > I didn't realize the max_chunk_size limit isn't reliable at that
> > timing.
> 
> No problem, I should not have jumped to the conclusion to recreate
> the
> backup volume.
> 
> > > Anyway, the FS was already recreated, so I needed to do a new
> > > backup.
> > > During the backup (using rsync -vah), Server A (the source)
> > > encountered
> > > an I/O error and my rsync failed. In an attempt to "quick fix"
> > > the
> > > issue, I rebooted Server A after which the FS would not mount
> > > anymore.
> > 
> > Did you have any dmesg about that IO error?
> 
> Yes there was. But I omitted capturing it... The system is now
> rebooted
> and I can't retrieve it anymore. :-(
> 
> > And how is the reboot scheduled? Forced power off or normal reboot
> > command?
> 
> The system was rebooted using a normal reboot command.
> 
> > > I documented what I have tried, below. I have not yet tried
> > > anything
> > > except what is shown, because I am afraid of causing more harm to
> > > the FS.
> > 
> > Pretty clever, no btrfs check --repair is a pretty good move.
> > 
> > > I hope somebody here can give me advice on how to (hopefully)
> > > retrieve my data...
> > > 
> > > Thanks in advance!
> > > 
> > > ==========================================
> > > 
> > > [root@cornelis ~]# btrfs fi show
> > > Label: 'cornelis-btrfs'  uuid: ac643516-670e-40f3-aa4c-
> > > f329fc3795fd
> > >   Total devices 1 FS bytes used 463.92GiB
> > >   devid    1 size 800.00GiB used 493.02GiB path
> > > /dev/mapper/cornelis-cornelis--btrfs
> > > 
> > > Label: 'data'  uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5
> > >   Total devices 20 FS bytes used 44.85TiB
> > >   devid    1 size 3.64TiB used 3.64TiB path /dev/sdn2
> > >   devid    2 size 3.64TiB used 3.64TiB path /dev/sdp2
> > >   devid    3 size 3.64TiB used 3.64TiB path /dev/sdu2
> > >   devid    4 size 3.64TiB used 3.64TiB path /dev/sdx2
> > >   devid    5 size 3.64TiB used 3.64TiB path /dev/sdh2
> > >   devid    6 size 3.64TiB used 3.64TiB path /dev/sdg2
> > >   devid    7 size 3.64TiB used 3.64TiB path /dev/sdm2
> > >   devid    8 size 3.64TiB used 3.64TiB path /dev/sdw2
> > >   devid    9 size 3.64TiB used 3.64TiB path /dev/sdj2
> > >   devid   10 size 3.64TiB used 3.64TiB path /dev/sdt2
> > >   devid   11 size 3.64TiB used 3.64TiB path /dev/sdk2
> > >   devid   12 size 3.64TiB used 3.64TiB path /dev/sdq2
> > >   devid   13 size 3.64TiB used 3.64TiB path /dev/sds2
> > >   devid   14 size 3.64TiB used 3.64TiB path /dev/sdf2
> > >   devid   15 size 7.28TiB used 588.80GiB path /dev/sdr2
> > >   devid   16 size 7.28TiB used 588.80GiB path /dev/sdo2
> > >   devid   17 size 7.28TiB used 588.80GiB path /dev/sdv2
> > >   devid   18 size 7.28TiB used 588.80GiB path /dev/sdi2
> > >   devid   19 size 7.28TiB used 588.80GiB path /dev/sdl2
> > >   devid   20 size 7.28TiB used 588.80GiB path /dev/sde2
> > > 
> > > [root@cornelis ~]# mount /dev/sdn2 /mnt/data
> > > mount: /mnt/data: wrong fs type, bad option, bad superblock on
> > > /dev/sdn2, missing codepage or helper program, or other error.
> > 
> > What is the dmesg of the mount failure?
> 
> [Sun Dec  2 09:41:08 2018] BTRFS info (device sdn2): disk space
> caching
> is enabled
> [Sun Dec  2 09:41:08 2018] BTRFS info (device sdn2): has skinny
> extents
> [Sun Dec  2 09:41:08 2018] BTRFS error (device sdn2): parent transid
> verify failed on 46451963543552 wanted 114401 found 114173
> [Sun Dec  2 09:41:08 2018] BTRFS critical (device sdn2): corrupt
> leaf:
> root=2 block=46451963543552 slot=0, unexpected item end, have
> 1387359977 expect 16283
> [Sun Dec  2 09:41:08 2018] BTRFS warning (device sdn2): failed to
> read
> tree root
> [Sun Dec  2 09:41:08 2018] BTRFS error (device sdn2): open_ctree
> failed
> 
> > And have you tried -o ro,degraded ?
> 
> Tried it just now, gives the exact same error.
> 
> > > [root@cornelis ~]# btrfs check /dev/sdn2
> > > Opening filesystem to check...
> > > parent transid verify failed on 46451963543552 wanted 114401
> > > found
> > > 114173
> > > parent transid verify failed on 46451963543552 wanted 114401
> > > found
> > > 114173
> > > checksum verify failed on 46451963543552 found A8F2A769 wanted
> > > 4C111ADF
> > > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > > 8B07ABE4
> > > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > > 8B07ABE4
> > > bad tree block 46451963543552, bytenr mismatch,
> > > want=46451963543552,
> > > have=75208089814272
> > > Couldn't read tree root
> > 
> > Would you please also paste the output of "btrfs ins dump-super
> > /dev/sdn2" ?
> 
> [root@cornelis ~]# btrfs ins dump-super /dev/sdn2
> superblock: bytenr=65536, device=/dev/sdn2
> ---------------------------------------------------------
> csum_type             0 (crc32c)
> csum_size             4
> csum                  0x51725c39 [match]
> bytenr                        65536
> flags                 0x1
>                       ( WRITTEN )
> magic                 _BHRfS_M [match]
> fsid                  4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5
> label                 data
> generation            114401
> root                  46451963543552
> sys_array_size                513
> chunk_root_generation 112769
> root_level            1
> chunk_root            22085632
> chunk_root_level      1
> log_root              46451935461376
> log_root_transid      0
> log_root_level                0
> total_bytes           104020314161152
> bytes_used            49308554543104
> sectorsize            4096
> nodesize              16384
> leafsize (deprecated)         16384
> stripesize            4096
> root_dir              6
> num_devices           20
> compat_flags          0x0
> compat_ro_flags               0x0
> incompat_flags                0x1e1
>                       ( MIXED_BACKREF |
>                         BIG_METADATA |
>                         EXTENDED_IREF |
>                         RAID56 |
>                         SKINNY_METADATA )
> cache_generation      114401
> uuid_tree_generation  114401
> dev_item.uuid         c6b44903-e849-4403-98c4-f3ba4d0b3fc3
> dev_item.fsid         4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 [match]
> dev_item.type         0
> dev_item.total_bytes  4000783007744
> dev_item.bytes_used   4000781959168
> dev_item.io_align     4096
> dev_item.io_width     4096
> dev_item.sector_size  4096
> dev_item.devid                1
> dev_item.dev_group    0
> dev_item.seek_speed   0
> dev_item.bandwidth    0
> dev_item.generation   0
> 
> > It looks like your tree root (or at least some tree root
> > nodes/leaves
> > get corrupted)
> > 
> > > ERROR: cannot open file system
> > 
> > And since it's your tree root corrupted, you could also try
> > "btrfs-find-root <device>" to try to get a good old copy of your
> > tree
> > root.
> 
> The output is rather long. I pasted it here: 
> https://pastebin.com/FkyBLgj9
> 
> I'm unsure what to look for in this output?
> 
> > But I suspect the corruption happens before you noticed, thus the
> > old
> > tree root may not help much.
> > 
> > Also, the output of "btrfs ins dump-tree -t root <device>" will
> > help.
> 
> Here it is:
> 
> [root@cornelis ~]# btrfs ins dump-tree -t root /dev/sdn2
> btrfs-progs v4.19 
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> parent transid verify failed on 46451963543552 wanted 114401 found
> 114173
> checksum verify failed on 46451963543552 found A8F2A769 wanted
> 4C111ADF
> checksum verify failed on 46451963543552 found 32153BE8 wanted
> 8B07ABE4
> checksum verify failed on 46451963543552 found 32153BE8 wanted
> 8B07ABE4
> bad tree block 46451963543552, bytenr mismatch, want=46451963543552,
> have=75208089814272
> Couldn't read tree root
> ERROR: unable to open /dev/sdn2
> 
> > Thanks,
> > Qu
> 
> No, thank YOU! :-)
> 
> > > [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/
> > > parent transid verify failed on 46451963543552 wanted 114401
> > > found
> > > 114173
> > > parent transid verify failed on 46451963543552 wanted 114401
> > > found
> > > 114173
> > > checksum verify failed on 46451963543552 found A8F2A769 wanted
> > > 4C111ADF
> > > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > > 8B07ABE4
> > > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > > 8B07ABE4
> > > bad tree block 46451963543552, bytenr mismatch,
> > > want=46451963543552,
> > > have=75208089814272
> > > Couldn't read tree root
> > > Could not open root, trying backup super
> > > warning, device 14 is missing
> > > warning, device 13 is missing
> > > warning, device 12 is missing
> > > warning, device 11 is missing
> > > warning, device 10 is missing
> > > warning, device 9 is missing
> > > warning, device 8 is missing
> > > warning, device 7 is missing
> > > warning, device 6 is missing
> > > warning, device 5 is missing
> > > warning, device 4 is missing
> > > warning, device 3 is missing
> > > warning, device 2 is missing
> > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > > bad tree block 22085632, bytenr mismatch, want=22085632,
> > > have=1147797504
> > > ERROR: cannot read chunk root
> > > Could not open root, trying backup super
> > > warning, device 14 is missing
> > > warning, device 13 is missing
> > > warning, device 12 is missing
> > > warning, device 11 is missing
> > > warning, device 10 is missing
> > > warning, device 9 is missing
> > > warning, device 8 is missing
> > > warning, device 7 is missing
> > > warning, device 6 is missing
> > > warning, device 5 is missing
> > > warning, device 4 is missing
> > > warning, device 3 is missing
> > > warning, device 2 is missing
> > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > > bad tree block 22085632, bytenr mismatch, want=22085632,
> > > have=1147797504
> > > ERROR: cannot read chunk root
> > > Could not open root, trying backup super
> > > 
> > > [root@cornelis ~]# uname -r
> > > 4.18.16-arch1-1-ARCH
> > > 
> > > [root@cornelis ~]# btrfs --version
> > > btrfs-progs v4.19
> > > 

Reply via email to