2019-04-07 20:45, Chris Murphy:
On Sun, Apr 7, 2019 at 1:42 AM Nik. <bt...@avgustinov.eu> wrote:
2019-04-07 01:18, Qu Wenruo:

You have 2 bits flipped just in one tree block!

If the data-tree structures alone have so many bits flipped, how much
flipped bits are to be expected in the data itself? What should a normal
btrfs user do in order to prevent such disasters?

I think the corruption in your case is inferred by Btrfs only by bad
key ordering, not csum failure for the leaf? I can't tell for sure
from the error, but I don't see a csum complaint.

I do not quite understand where the "bad key ordering" came from, but my question why (in my case) it keeps happening only to the btrfs file systems? Is it relevant, that all four failed systems have had initially ext4 format and were converted to btrfs (with the btrfs-progs used 5-6 years ago)?

Another question: I am sure that many btrfs users are ready in some cases to trade reliability for performance; wouldn't it be interesting to introduce a kind of switch/option like the "verify on", used many years ago on msdos-systems to ensure that write operations (especially on floppy disks) were successful? Just an idea...

My btrfs-restore is still running (since Monday evening, until now about 50% restored), and I am on a business trip. As soon as it finishes and I am back home I will compare with the backup and give more info, but it seems that this would need another day or two.

Kind regards,

Nik.
--

I'd expect a RAM caused corruption could affect a metadata leaf data,
followed by csum computation. Therefore no csum failure on subsequent
read. Whereas if the corruption is storage stack related, we'd see a
csum error on subsequent read.

Once there's corruption in a block address, the corruption can
propagate into anything else that depends on that block address even
if there isn't another corruption event. So one event, multiple
corruptions.


And another thing: if I am getting it right, it should have been more
reliable/appropriate to let btrfs manage the five disks behind the md0
with a raid1 profile instead binding them in a RAID5 and "giving" just a
single device to btrfs.

Not necessarily. If corruption happens early enough, it gets baked
into all copies of the metadata.


Reply via email to