I have some additional info. I found the reason the FS got corrupted. It was a single failing drive, which caused the entire cabinet (containing 7 drives) to reset. So the FS suddenly lost 7 drives.
I have removed the failed drive, so the RAID is now degraded. I hope the data is still recoverable... ☹ -- Groet / Cheers, Patrick Dijkgraaf On Sun, 2018-12-02 at 10:03 +0100, Patrick Dijkgraaf wrote: > Hi Qu, > > Thanks for helping me! > > Please see the reponses in-line. > Any suggestions based on this? > > Thanks! > > > On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote: > > On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote: > > > Hi all, > > > > > > I have been a happy BTRFS user for quite some time. But now I'm > > > facing > > > a potential ~45TB dataloss... :-( > > > I hope someone can help! > > > > > > I have Server A and Server B. Both having a 20-devices BTRFS > > > RAID6 > > > filesystem. Because of known RAID5/6 risks, Server B was a backup > > > of > > > Server A. > > > After applying updates to server B and reboot, the FS would not > > > mount > > > anymore. Because it was "just" a backup. I decided to recreate > > > the > > > FS > > > and perform a new backup. Later, I discovered that the FS was not > > > broken, but I faced this issue: > > > https://patchwork.kernel.org/patch/10694997/ > > > > > > > > > > Sorry for the inconvenience. > > > > I didn't realize the max_chunk_size limit isn't reliable at that > > timing. > > No problem, I should not have jumped to the conclusion to recreate > the > backup volume. > > > > Anyway, the FS was already recreated, so I needed to do a new > > > backup. > > > During the backup (using rsync -vah), Server A (the source) > > > encountered > > > an I/O error and my rsync failed. In an attempt to "quick fix" > > > the > > > issue, I rebooted Server A after which the FS would not mount > > > anymore. > > > > Did you have any dmesg about that IO error? > > Yes there was. But I omitted capturing it... The system is now > rebooted > and I can't retrieve it anymore. :-( > > > And how is the reboot scheduled? Forced power off or normal reboot > > command? > > The system was rebooted using a normal reboot command. > > > > I documented what I have tried, below. I have not yet tried > > > anything > > > except what is shown, because I am afraid of causing more harm to > > > the FS. > > > > Pretty clever, no btrfs check --repair is a pretty good move. > > > > > I hope somebody here can give me advice on how to (hopefully) > > > retrieve my data... > > > > > > Thanks in advance! > > > > > > ========================================== > > > > > > [root@cornelis ~]# btrfs fi show > > > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c- > > > f329fc3795fd > > > Total devices 1 FS bytes used 463.92GiB > > > devid 1 size 800.00GiB used 493.02GiB path > > > /dev/mapper/cornelis-cornelis--btrfs > > > > > > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > > > Total devices 20 FS bytes used 44.85TiB > > > devid 1 size 3.64TiB used 3.64TiB path /dev/sdn2 > > > devid 2 size 3.64TiB used 3.64TiB path /dev/sdp2 > > > devid 3 size 3.64TiB used 3.64TiB path /dev/sdu2 > > > devid 4 size 3.64TiB used 3.64TiB path /dev/sdx2 > > > devid 5 size 3.64TiB used 3.64TiB path /dev/sdh2 > > > devid 6 size 3.64TiB used 3.64TiB path /dev/sdg2 > > > devid 7 size 3.64TiB used 3.64TiB path /dev/sdm2 > > > devid 8 size 3.64TiB used 3.64TiB path /dev/sdw2 > > > devid 9 size 3.64TiB used 3.64TiB path /dev/sdj2 > > > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2 > > > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2 > > > devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2 > > > devid 13 size 3.64TiB used 3.64TiB path /dev/sds2 > > > devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2 > > > devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2 > > > devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2 > > > devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2 > > > devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2 > > > devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2 > > > devid 20 size 7.28TiB used 588.80GiB path /dev/sde2 > > > > > > [root@cornelis ~]# mount /dev/sdn2 /mnt/data > > > mount: /mnt/data: wrong fs type, bad option, bad superblock on > > > /dev/sdn2, missing codepage or helper program, or other error. > > > > What is the dmesg of the mount failure? > > [Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): disk space > caching > is enabled > [Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): has skinny > extents > [Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): parent transid > verify failed on 46451963543552 wanted 114401 found 114173 > [Sun Dec 2 09:41:08 2018] BTRFS critical (device sdn2): corrupt > leaf: > root=2 block=46451963543552 slot=0, unexpected item end, have > 1387359977 expect 16283 > [Sun Dec 2 09:41:08 2018] BTRFS warning (device sdn2): failed to > read > tree root > [Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): open_ctree > failed > > > And have you tried -o ro,degraded ? > > Tried it just now, gives the exact same error. > > > > [root@cornelis ~]# btrfs check /dev/sdn2 > > > Opening filesystem to check... > > > parent transid verify failed on 46451963543552 wanted 114401 > > > found > > > 114173 > > > parent transid verify failed on 46451963543552 wanted 114401 > > > found > > > 114173 > > > checksum verify failed on 46451963543552 found A8F2A769 wanted > > > 4C111ADF > > > checksum verify failed on 46451963543552 found 32153BE8 wanted > > > 8B07ABE4 > > > checksum verify failed on 46451963543552 found 32153BE8 wanted > > > 8B07ABE4 > > > bad tree block 46451963543552, bytenr mismatch, > > > want=46451963543552, > > > have=75208089814272 > > > Couldn't read tree root > > > > Would you please also paste the output of "btrfs ins dump-super > > /dev/sdn2" ? > > [root@cornelis ~]# btrfs ins dump-super /dev/sdn2 > superblock: bytenr=65536, device=/dev/sdn2 > --------------------------------------------------------- > csum_type 0 (crc32c) > csum_size 4 > csum 0x51725c39 [match] > bytenr 65536 > flags 0x1 > ( WRITTEN ) > magic _BHRfS_M [match] > fsid 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 > label data > generation 114401 > root 46451963543552 > sys_array_size 513 > chunk_root_generation 112769 > root_level 1 > chunk_root 22085632 > chunk_root_level 1 > log_root 46451935461376 > log_root_transid 0 > log_root_level 0 > total_bytes 104020314161152 > bytes_used 49308554543104 > sectorsize 4096 > nodesize 16384 > leafsize (deprecated) 16384 > stripesize 4096 > root_dir 6 > num_devices 20 > compat_flags 0x0 > compat_ro_flags 0x0 > incompat_flags 0x1e1 > ( MIXED_BACKREF | > BIG_METADATA | > EXTENDED_IREF | > RAID56 | > SKINNY_METADATA ) > cache_generation 114401 > uuid_tree_generation 114401 > dev_item.uuid c6b44903-e849-4403-98c4-f3ba4d0b3fc3 > dev_item.fsid 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 [match] > dev_item.type 0 > dev_item.total_bytes 4000783007744 > dev_item.bytes_used 4000781959168 > dev_item.io_align 4096 > dev_item.io_width 4096 > dev_item.sector_size 4096 > dev_item.devid 1 > dev_item.dev_group 0 > dev_item.seek_speed 0 > dev_item.bandwidth 0 > dev_item.generation 0 > > > It looks like your tree root (or at least some tree root > > nodes/leaves > > get corrupted) > > > > > ERROR: cannot open file system > > > > And since it's your tree root corrupted, you could also try > > "btrfs-find-root <device>" to try to get a good old copy of your > > tree > > root. > > The output is rather long. I pasted it here: > https://pastebin.com/FkyBLgj9 > > I'm unsure what to look for in this output? > > > But I suspect the corruption happens before you noticed, thus the > > old > > tree root may not help much. > > > > Also, the output of "btrfs ins dump-tree -t root <device>" will > > help. > > Here it is: > > [root@cornelis ~]# btrfs ins dump-tree -t root /dev/sdn2 > btrfs-progs v4.19 > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > parent transid verify failed on 46451963543552 wanted 114401 found > 114173 > checksum verify failed on 46451963543552 found A8F2A769 wanted > 4C111ADF > checksum verify failed on 46451963543552 found 32153BE8 wanted > 8B07ABE4 > checksum verify failed on 46451963543552 found 32153BE8 wanted > 8B07ABE4 > bad tree block 46451963543552, bytenr mismatch, want=46451963543552, > have=75208089814272 > Couldn't read tree root > ERROR: unable to open /dev/sdn2 > > > Thanks, > > Qu > > No, thank YOU! :-) > > > > [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/ > > > parent transid verify failed on 46451963543552 wanted 114401 > > > found > > > 114173 > > > parent transid verify failed on 46451963543552 wanted 114401 > > > found > > > 114173 > > > checksum verify failed on 46451963543552 found A8F2A769 wanted > > > 4C111ADF > > > checksum verify failed on 46451963543552 found 32153BE8 wanted > > > 8B07ABE4 > > > checksum verify failed on 46451963543552 found 32153BE8 wanted > > > 8B07ABE4 > > > bad tree block 46451963543552, bytenr mismatch, > > > want=46451963543552, > > > have=75208089814272 > > > Couldn't read tree root > > > Could not open root, trying backup super > > > warning, device 14 is missing > > > warning, device 13 is missing > > > warning, device 12 is missing > > > warning, device 11 is missing > > > warning, device 10 is missing > > > warning, device 9 is missing > > > warning, device 8 is missing > > > warning, device 7 is missing > > > warning, device 6 is missing > > > warning, device 5 is missing > > > warning, device 4 is missing > > > warning, device 3 is missing > > > warning, device 2 is missing > > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 > > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 > > > bad tree block 22085632, bytenr mismatch, want=22085632, > > > have=1147797504 > > > ERROR: cannot read chunk root > > > Could not open root, trying backup super > > > warning, device 14 is missing > > > warning, device 13 is missing > > > warning, device 12 is missing > > > warning, device 11 is missing > > > warning, device 10 is missing > > > warning, device 9 is missing > > > warning, device 8 is missing > > > warning, device 7 is missing > > > warning, device 6 is missing > > > warning, device 5 is missing > > > warning, device 4 is missing > > > warning, device 3 is missing > > > warning, device 2 is missing > > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 > > > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0 > > > bad tree block 22085632, bytenr mismatch, want=22085632, > > > have=1147797504 > > > ERROR: cannot read chunk root > > > Could not open root, trying backup super > > > > > > [root@cornelis ~]# uname -r > > > 4.18.16-arch1-1-ARCH > > > > > > [root@cornelis ~]# btrfs --version > > > btrfs-progs v4.19 > > >