Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

Christoph Anton Mitterer Wed, 21 Sep 2016 20:07:33 -0700

On Thu, 2016-09-22 at 10:08 +0800, Qu Wenruo wrote:
> And I don't see the necessary to csum the parity.
> Why csum a csum again?


I'd say simply for the following reason:
Imagine the smallest RAID5: 2x data D1 D2, 1x parity P
If D2 is lost it could be recalculated via D1 and P.

What if only (all) the checksum information for D2 is lost (e.g.
because of further silent data corruption on the blocks of these
csums)?
Then we'd only know D1 is valid (which still has working csums). But we
wouldn't know whether D2 is (because gone) neither whether P is
(because not csummed).
And next imagine silent data corruption in either D2 or P => we cannot
tell which of them is valid, no repair possible... or do I miss
something?


> Just as you expected, it doesn't check parity.
> Even for RAID1/DUP, it won't check the backup if it succeeded
> reading 
> the first stripe.
That would IMO be really a highly severe bug... making scrubbing close
to completely useless for multi-device fs.
I mean the whole reason for doing it is to find [silently] corrupted
blocks... in order to be able to do something about it.
If on would only notice if the read actually fails because the one
working block is also gone.. then why having a RAID in the first place?

> Current implement doesn't really care if it's the data or the copy 
> corrupted, any data can be read out, then there is no problem.
Except it makes RAID practically useless... => problem


Cheers,
Chris.

smime.p7s
Description: S/MIME cryptographic signature

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

Reply via email to