On 12/04/2012 11:20, Stan Hoeppner wrote:
On 4/11/2012 9:23 PM, Emmanuel Noobadmin wrote:
On 4/12/12, Stan Hoeppner<s...@hardwarefreak.com>  wrote:
On 4/11/2012 11:50 AM, Ed W wrote:
One of the snags of md RAID1 vs RAID6 is the lack of checksumming in the
event of bad blocks.  (I'm not sure what actually happens when md
scrubbing finds a bad sector with raid1..?).  For low performance
requirements I have become paranoid and been using RAID6 vs RAID10,
filesystems with sector checksums seem attractive...
Except we're using hardware RAID1 here and mdraid linear.  Thus the
controller takes care of sector integrity.  RAID6 yields nothing over
RAID10, except lower performance, and more usable space if more than 4
drives are used.
How would the control ensure sector integrity unless it is writing
additional checksum information to disk? I thought only a few
filesystems like ZFS does the sector checksum to detect if any data
corruption occurred. I suppose the controller could throw an error if
the two drives returned data that didn't agree with each other but it
wouldn't know which is the accurate copy but that wouldn't protect the
integrity of the data, at least not directly without additional human
intervention I would think.
When a drive starts throwing uncorrectable read errors, the controller
faults the drive and tells you to replace it.  Good hardware RAID
controllers are notorious for their penchant to kick drives that would
continue to work just fine in mdraid or as a single drive for many more
years.  The mindset here is that anyone would rather spent $150-$2500
dollars on a replacement drive than take a chance with his/her valuable
data.


I'm asking a subtlely different question.

The claim by ZFS/BTRFS authors and others is that data silently "bit rots" on it's own. The claim is therefore that you can have a raid1 pair where neither drive reports a hardware failure, but each gives you different data? I can't personally claim to have observed this, so it remains someone else's theory... (for background my experience is simply: RAID10 for high performance arrays and RAID6 for all my personal data - I intend to investigate your linear raid idea in the future though)

I do agree that if one drive reports a read error, then it's quite easy to guess which pair of the array is wrong...

Just as an aside, I don't have a lot of failure experience. However, the few I have had (perhaps 6-8 events now) is that there is a massive correlation in failure time with RAID1, eg one pair I had lasted perhaps 2 years and then both failed within 6 hours of each other. I also had a bad experience with RAID 5 that wasn't being scrubbed regularly and when one drive started reporting errors (ie lack of monitoring meant it had been bad for a while), the rest of the array turned out to be a patchwork of read errors - linux raid then turns out to be quite fragile in the presence of a small number of read failures and it's extremely difficult to salvage the 99% of the array which is ok due to the disks getting kicked out... (of course regular scrubs would have prevented getting so deep into that situation - it was a small cheap nas box without such features)

Ed W

Reply via email to