Re: RAID? Was: PATA hard disks, anyone?

Fred Cisin via cctalk Wed, 28 Mar 2018 11:32:26 -0700

How many drives would you need, to be able to set up a RAID, or hot swappable 
RAUD (Redundant Array of Unreliable Drives), that could give decent reliability 
with such drives?
How many to be able to not have data loss if a second one dies before the first 
casualty is replaced?
How many to be able to avoid data loss if a third one dies before the first two 
are replaced?


On Wed, 28 Mar 2018, Paul Koning wrote:

These are straightforward questions of probability math, but it takessome time to get the details right. For one thing, you need believablenumbers for the underlying error probabilities. And you have to analyzethe cases carefully.


THANK YOU for the detailed explanation!

The basic assumption is that failures are "fail stop", i.e., a driverefuses to deliver data. (In particular, it doesn't lie -- deliverwrong data. You can build systems that deal with lying drives but RAIDis not such a system.) The failure may be the whole drive ("it's adoor-stop") or individual blocks (hard read errors).

So, in addition to the "RAID" configuration, you would also needadditional redundancy to compare multiple reads for error detection.

At the simplest level, if the reads don't match, then there is an error.
If a retry produces different dataa, then that drive has an error.

If two drives agree against a third, then there is a high probability thatthe variant drive is in error.

In either case, RAID-1 and RAID-5 handle single faults. RAID-6 isn't asingle well-defined thing but as normally defined it is a system thathandles double faults. So a RAID-1 system with a double fault may failto give you your data. (It may also be ok -- it depends on where thefaults are.) RAID-5 ditto.
The tricky part is what happens when a drive breaks. Consider RAID-5with a single dead drive, and the others are 100% ok. Your data isstill good. When the broken drive is replaced, RAID rebuilds the bitsthat belong on that drive. Once that rebuild finishes, you're onceagain fault tolerant. But a second failure prior to rebuild completionmeans loss of data.


With very unreliable drives, that isn't acceptable.
If each "drive" within the RAID were itself a RAID, . . .
Getting to be a complicated controller, or cascading controllers, . . .

So one way to look at it: given the MTBF, calculate the probability oftwo drives failing within N hours (where N is the time required toreplace the failed drive and then rebuild the data onto the new drive).But that is not the whole story.

'course not. Besides MTBF for calculating the probability of a seconddrive failing within N hours, must also consider other factors, such asexternal influences causing more than one drive to go, and the essentiallynon-linear aspect of a failure rate curve.

The other part of the story is that drives have a non-zero probabilityof a hard read error. So during rebuild, you may encounter a sector onone of the remaining drives that can't be read. If so, that sector islost.

If we consider that to be a "drive failure", then we are back to designingaround multiple failures.

The probability of hard read error varies with drive technology. And ofcourse, the larger the drive, the greater the probability (all elsebeing equal) of having SOME sector be unreadable. For drives smallenough to have PATA interfaces, the probability of hard read error isprobably low enough that you can *usually* read the whole drive withouterror. That translates to: RAID-1 and RAID-5 are generally adequate forPATA disks.


"generally".

The original thought behind this silly suggestion was whether it would bepossible to make use of MANY very unreliable drives.

On the very large drives currently available, it's a different story,and the published drive specs make this quite clear. This is why RAID-6is much more popular now than it was earlier. It isn't the probabilityof two nearly simultaneous drive failures, but rather the probability ofa hard sector read error while a drive has failed, that argues for theuse of RAID-6 in modern storage systems.

Re: RAID? Was: PATA hard disks, anyone?

Reply via email to