Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

Qu Wenruo Wed, 10 Apr 2019 17:46:04 -0700


On 2019/4/11 上午5:03, Nik. wrote:
> 
> 
> 2019-04-07 20:45, Chris Murphy:
>> On Sun, Apr 7, 2019 at 1:42 AM Nik. <bt...@avgustinov.eu> wrote:
>>> 2019-04-07 01:18, Qu Wenruo:
>>
>>>> You have 2 bits flipped just in one tree block!
>>>>
>>> If the data-tree structures alone have so many bits flipped, how much
>>> flipped bits are to be expected in the data itself? What should a normal
>>> btrfs user do in order to prevent such disasters?
>>
>> I think the corruption in your case is inferred by Btrfs only by bad
>> key ordering, not csum failure for the leaf? I can't tell for sure
>> from the error, but I don't see a csum complaint.
> 
> I do not quite understand where the "bad key ordering" came from, but my
> question why (in my case) it keeps happening only to the btrfs file
> systems?


Because btrfs uses a more generic tree structure, to keep everything in
other.

Unlike other fs (xfs/ext*), they have their own special structure for
its inode, its regular file, its directory. Btrfs use one single but
more complex structure to record everything.

This also means, there are more somewhat redundancy in the tree
structure. Thus easier to get corrupted.
E.g. If xfs only needs 3 blocks to record its data structures, btrfs may
need 7 blocks. Thus if one bit get flipped in memory (either by hardware
of fs itself) it's easier to hit btrfs than xfs.


> Is it relevant, that all four failed systems have had initially
> ext4 format and were converted to btrfs (with the btrfs-progs used 5-6
> years ago)?

Converted to btrfs has some problem, especially when it comes to 5~6
years ago.
That old convert uses (almost abuse) a certain feature of btrfs,
creating a very strange chunk layout. It's valid but very tricky.
I'm not sure if it's related, but possible.

> 
> Another question: I am sure that many btrfs users are ready in some
> cases to trade reliability for performance; wouldn't it be interesting
> to introduce a kind of switch/option like the "verify on", used many
> years ago on msdos-systems to ensure that write operations (especially
> on floppy disks) were successful? Just an idea...

My personal take is, reliability is beyond everything, especially for an
already somewhat unstable or easy to corrupt fs.

So from recent kernel releases, we have more and more mandatory
verifications.
At least we're trying to make btrfs more and more robust.

Thanks,
Qu

> 
> My btrfs-restore is still running (since Monday evening, until now about
> 50% restored), and I am on a business trip. As soon as it finishes and I
> am back home I will compare with the backup and give more info, but it
> seems that this would need another day or two.
> 
> Kind regards,
> 
> Nik.
> -- 
> 
>> I'd expect a RAM caused corruption could affect a metadata leaf data,
>> followed by csum computation. Therefore no csum failure on subsequent
>> read. Whereas if the corruption is storage stack related, we'd see a
>> csum error on subsequent read.
>>
>> Once there's corruption in a block address, the corruption can
>> propagate into anything else that depends on that block address even
>> if there isn't another corruption event. So one event, multiple
>> corruptions.
>>
>>
>>> And another thing: if I am getting it right, it should have been more
>>> reliable/appropriate to let btrfs manage the five disks behind the md0
>>> with a raid1 profile instead binding them in a RAID5 and "giving" just a
>>> single device to btrfs.
>>
>> Not necessarily. If corruption happens early enough, it gets baked
>> into all copies of the metadata.
>>
>>

signature.asc
Description: OpenPGP digital signature

Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

Reply via email to