On 2018年03月13日 16:53, Dirk Gouders wrote:
> Hello all,
> 
> a somewhat aged RAID array (16 Disks) got into trouble after it has
> been powered off because of facility management maintenance tasks.
> 
> It then went through some rebuilds loosing three disks on the way and
> the whole procedure ended with corrupted volumes.  Volumes with
> ext{2,4} filesystems could be fsck'ed and corresponding VMs then
> started but a volume with a (probably) BTRFS partition I am not able
> to get very far with.  I got no information what filesystems were used
> on the corresponding VM but I knew it was an opensSUSE system and
> file(1) told me:
> 
> # file -s /dev/loop0p1
> /dev/loop0p1: BTRFS Filesystem sectorsize 4096, nodesize 16384, leafsize 
> 16384, UUID=a6459a90-ebe3-4c75-97f4-5496eadcc96f, 9141452800/10741612544 
> bytes used, 1 devices
> 
> so I am somewhat sure that it was a BTRFS.
> 
> I tried to use some tools on copies of the Volume data and see messages
> concerning invalid checksums as well as ones of bad tree block starts
> and I'd like to understand what the main issue of that FS might be.
> 
> I'll try to present some information and because I worked only on copies
> of the corrupted data, I can provide more information or tests on
> request. The kernel on the machine I use for diagnosis is
> 4.16.0-rc5-00004-gfc6eabbbf8ef.
> 
> Mounting:
> 
> # mount /dev/loop0p1 /mnt/
> mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0p1, 
> missing codepage or helper program, or other error.
> 
> dmesg(1) says:
> 
> [  176.479080] BTRFS: device fsid a6459a90-ebe3-4c75-97f4-5496eadcc96f devid 
> 1 transid 9858294 /dev/loop0p1
> [  186.909100] BTRFS info (device loop0p1): disk space caching is enabled
> [  186.990090] BTRFS error (device loop0p1): bad tree block start 
> 2163788338953595011 212353024
> [  186.996331] BTRFS error (device loop0p1): bad tree block start 
> 8619112249313723677 212353024

Logical tree block 212353024 is corrupted.
No copy has correct bytenr.

> [  187.044482] BTRFS error (device loop0p1): open_ctree failed

Some corruption happened without corresponding kernel message.

> 
> find-root:
> 
> # btrfs-find-root /dev/loop0p1
> Superblock thinks the generation is 9858294
> Superblock thinks the level is 1
> Found tree root at 848773120 gen 9858294 level 1

Tree root is found, find-root won't help much here.
And if it's really tree root corruption, we should have some kernel
message for it.

> Well block 832045056(gen: 9858272 level: 1) seems good, but generation/level 
> doesn't match, want gen: 9858294 level: 1

Especially when the next tree block is 22 generation older.

Would you please try to call "btrfs inspect dump-tree <device>" and
paste the result with *stderr*?

At least we could know which tree block is corrupted.

Thanks,
Qu

> Well block 831799296(gen: 9858271 level: 1) seems good, but generation/level 
> doesn't match, want gen: 9858294 level: 1
> Well block 831520768(gen: 9858270 level: 1) seems good, but generation/level 
> doesn't match, want gen: 9858294 level: 1
> 
> ...several similar lines that differ only in the block and gen, the
> last two lines differ a bit more:
> 
> Well block 72089600(gen: 9728190 level: 0) seems good, but generation/level 
> doesn't match, want gen: 9858294 level: 1
> Well block 4243456(gen: 3 level: 0) seems good, but generation/level doesn't 
> match, want gen: 9858294 level: 1
> Well block 4194304(gen: 2 level: 0) seems good, but generation/level doesn't 
> match, want gen: 9858294 level: 1
> 
> When I then try a restore with the first block # of the previous command:
> 
> # btrfs restore -t 832045056 -D /dev/loop0p1 /mnt/btrfs/
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> Ignoring transid failure
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> checksum verify failed on 363069440 found DC09290B wanted C630FD61
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> bytenr mismatch, want=363069440, have=17552567724568668829
> Could not open root, trying backup super
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> parent transid verify failed on 832045056 wanted 9858294 found 9858272
> Ignoring transid failure
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> checksum verify failed on 363069440 found DC09290B wanted C630FD61
> checksum verify failed on 363069440 found 296FB15A wanted F0AFE59D
> bytenr mismatch, want=363069440, have=17552567724568668829
> Could not open root, trying backup super
> ERROR: superblock bytenr 274877906944 is larger than device size 10741612544
> Could not open root, trying backup super
> 
> Dirk
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to