Re: Expected behavior of bad sectors on one drive in a RAID1

Duncan Tue, 20 Oct 2015 12:21:08 -0700

Austin S Hemmelgarn posted on Tue, 20 Oct 2015 09:59:17 -0400 as
excerpted:



>>> It is worth clarifying also that:
>>> a. While BTRFS will not return bad data in this case, it also won't
>>> automatically repair the corruption.
>>
>> Really?  If so I think that's a bug in BTRFS.  When mounted rw I think
>> that every time corruption is discovered it should be automatically
>> fixed.
> That's debatable.  While it is safer to try and do this with BTRFS than
> say with MD-RAID, it's still not something many seasoned system
> administrators would want happening behind their back.  It's worth
> noting that ZFS does not automatically fix errors, it just reports them
> and works around them, and many distributed storage options (like Ceph
> for example) behave like this also.  All that the checksum mismatch
> really tells you is that at some point, the data got corrupted, it could
> be that the copy on the disk is bad, but it could also be caused by bad
> RAM, a bad storage controller, a loose cable, or even a bad power
> supply.

There's a significant difference between btrfs in dup/raid1/raid10 modes 
anyway and some of the others you mentioned, however.  Btrfs in these 
modes actually has a second copy of the data itself available.  That's a 
world of difference compared to parity, for instance.  With parity you're 
reconstructing the data and thus have dangers such as the write hole, and 
the possibility of bad-ram corrupting the data before it was ever saved 
(this last one being the reason zfs has such strong recommendations/
warnings regarding the use of non-ecc RAM, based on what a number of 
posters with zfs experience have said, here).  With btrfs, there's an 
actual second copy, with both copies covered by checksum.  If one of the 
copies verifies against its checksum and the other doesn't, the odds of 
the one that verifies being any worse than the one that doesn't are... 
pretty slim, to say the least.  (So slim I'd intuitively compare them to 
the odds of getting hit by lightning, tho I've no idea what the 
mathematically rigorous comparison might be.)

Yes, there's some small but not infinitesimal chance the checksum may be 
wrong, but if there's two copies of the data and the checksum on one is 
wrong while the checksum on the other verifies... yes, there's still that 
small chance that the one that verifies is wrong too, but that it's any 
worse than the one that does not verify?  /That's/ getting close to 
infinitesimal, or at least close enough for the purposes of a mailing-
list claim without links to supporting evidence by someone who has 
already characterized it as not mathematically rigorous... and for me, 
personally.  I'm not spending any serious time thinking about getting hit 
by lightening, either, tho by the same token I don't go out flying kites 
or waving long metal rods around in lightning storms, either.

Meanwhile, it's worth noting that btrfs itself isn't yet entirely stable 
or mature, and that the chances of just plain old bugs killing the 
filesystem are far *FAR* higher than of a verified-checksum copy being 
any worse than a failed-checksum copy.  If you're worried about that at 
this point, why are you even on the btrfs list in the first place?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Expected behavior of bad sectors on one drive in a RAID1

Reply via email to