On 2018年03月13日 16:53, Dirk Gouders wrote: > Hello all, > > a somewhat aged RAID array (16 Disks) got into trouble after it has > been powered off because of facility management maintenance tasks. > > It then went through some rebuilds loosing three disks on the way and > the whole procedure ended with corrupted volumes. Volumes with > ext{2,4} filesystems could be fsck'ed and corresponding VMs then > started but a volume with a (probably) BTRFS partition I am not able > to get very far with. I got no information what filesystems were used > on the corresponding VM but I knew it was an opensSUSE system and > file(1) told me: > > # file -s /dev/loop0p1 > /dev/loop0p1: BTRFS Filesystem sectorsize 4096, nodesize 16384, leafsize > 16384, UUID=a6459a90-ebe3-4c75-97f4-5496eadcc96f, 9141452800/10741612544 > bytes used, 1 devices > > so I am somewhat sure that it was a BTRFS. > > I tried to use some tools on copies of the Volume data and see messages > concerning invalid checksums as well as ones of bad tree block starts > and I'd like to understand what the main issue of that FS might be. > > I'll try to present some information and because I worked only on copies > of the corrupted data, I can provide more information or tests on > request. The kernel on the machine I use for diagnosis is > 4.16.0-rc5-00004-gfc6eabbbf8ef. > > Mounting: > > # mount /dev/loop0p1 /mnt/ > mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0p1, > missing codepage or helper program, or other error. > > dmesg(1) says: > > [ 176.479080] BTRFS: device fsid a6459a90-ebe3-4c75-97f4-5496eadcc96f devid > 1 transid 9858294 /dev/loop0p1 > [ 186.909100] BTRFS info (device loop0p1): disk space caching is enabled > [ 186.990090] BTRFS error (device loop0p1): bad tree block start > 2163788338953595011 212353024 > [ 186.996331] BTRFS error (device loop0p1): bad tree block start > 8619112249313723677 212353024
Logical tree block 212353024 is corrupted. No copy has correct bytenr. > [ 187.044482] BTRFS error (device loop0p1): open_ctree failed Some corruption happened without corresponding kernel message. > > find-root: > > # btrfs-find-root /dev/loop0p1 > Superblock thinks the generation is 9858294 > Superblock thinks the level is 1 > Found tree root at 848773120 gen 9858294 level 1 Tree root is found, find-root won't help much here. And if it's really tree root corruption, we should have some kernel message for it. > Well block 832045056(gen: 9858272 level: 1) seems good, but generation/level > doesn't match, want gen: 9858294 level: 1 Especially when the next tree block is 22 generation older. Would you please try to call "btrfs inspect dump-tree <device>" and paste the result with *stderr*? At least we could know which tree block is corrupted. Thanks, Qu > Well block 831799296(gen: 9858271 level: 1) seems good, but generation/level > doesn't match, want gen: 9858294 level: 1 > Well block 831520768(gen: 9858270 level: 1) seems good, but generation/level > doesn't match, want gen: 9858294 level: 1 > > ...several similar lines that differ only in the block and gen, the > last two lines differ a bit more: > > Well block 72089600(gen: 9728190 level: 0) seems good, but generation/level > doesn't match, want gen: 9858294 level: 1 > Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't > match, want gen: 9858294 level: 1 > Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't > match, want gen: 9858294 level: 1 > > When I then try a restore with the first block # of the previous command: > > # btrfs restore -t 832045056 -D /dev/loop0p1 /mnt/btrfs/ > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > Ignoring transid failure > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > checksum verify failed on 363069440 found DC09290B wanted C630FD61 > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > bytenr mismatch, want=363069440, have=17552567724568668829 > Could not open root, trying backup super > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > parent transid verify failed on 832045056 wanted 9858294 found 9858272 > Ignoring transid failure > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > checksum verify failed on 363069440 found DC09290B wanted C630FD61 > checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D > bytenr mismatch, want=363069440, have=17552567724568668829 > Could not open root, trying backup super > ERROR: superblock bytenr 274877906944 is larger than device size 10741612544 > Could not open root, trying backup super > > Dirk > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
signature.asc
Description: OpenPGP digital signature