Re: [lopsa-discuss] Math for SysAdmins - Conference tutorial?

Charles Polisher Sun, 22 Sep 2013 18:21:03 -0700

David Lang wrote:
> RAID 6 or RAID 5?
> 
> I would not expect a single error in that transfer to kill the
> entire RAID, just to kill a second disk (and only if you have a
> third disk would the array die)


I apologize for the length of my response.

TL;DR: I meant RAID6. Expect is an interesting word. As many as 
       1 in 40 RAID 6 array rebuilds may fail in a given case. Use 
       smallish drives (<= 1TB) of enterprise quality, your data 
       will be safe.

This is a pessimistic walk-through with admitted weaknesses.  No
shop I'm familiar with transfers data at the maximum rate 24
hours a day, 7 days a week. But the solace we take in RAID 6
double redundancy is undermined by a key characteristic: to
rebuild a failed drive, every remaining active drive has to
perform a whole-disk transfer. This exposes the array to a
secondary hazard, and then a tertiary hazard, all of which seem
like remote possibilities, but aren't as remote as you might
hope. You'll probably never lose a RAID level 6 array, but it
depends on what you mean by 'probably'. Here's a timeline of a
failure, with both happy and sad outcomes. Starting at time t0,

t0  7 active 3TB elements (disks) are up + one available hot spare.
    Array integrity: OK. Array can sustain loss of two elements.
    +-------------------------+
    | 1  2  3  4  5  P  Q  Hs |     Elements 1..5 are data, P,Q are parity,
    +-------------------------+     Hs is Hot standby.


t1  Element 1 fails completely.
    The RAID controller removes element 1 from the array.
    The hot spare is promoted to active element 1a, the array starts
    rebuilding by reconstructing the missing data on the new element.
    Array integrity: OK. Array can sustain loss of one element.

    +-------------------------+
    | F  2  3  4  5  P  Q  1a |     F is faulted element.
    +-------------------------+     1a is marked for reconstruction.


t2  For 7200RPM drives transferring 145Mbytes/sec, rebuild needs
    a minimum of 5.75 hours to finish.
    Array integrity: OK. Array can sustain loss of one element.
    +-------------------------+
    | F  2  3  4  5  P  Q  1a |     F is faulted element.
    +-------------------------+     1a is being reconstructed.
         |  |  |  |  |  |  ^
         |  |  |  |  |  R->^
         |  |  |  |  R-->--^        Elements 2,3,4,5,P,Q are read
         |  |  |  R-->-----^        to reconstruct the failed element
         |  |  R-->--------^        from 144 terabits of data.
         |  R-->-----------^
         R-->--------------+

t3  The sysadmin replaces failed drive (was element 1) with a new drive.
    The RAID controller designates it as a new Hot spare.
    +-------------------------+
    | Hs 2  3  4  5  P  Q  1a |     1a is being reconstructed
    +-------------------------+     from 2,3,4,5,P,Q.


t4  During the reconstruction of element 1a, an unrecoverable
    single bit error is detected in element 3's bit stream.
    How likely is this? The probability of a single bit error in
    a whole-disk transfer is about

             (# bits in bitstream)
       p =  -----------------------
                   1 / BER

    Here the # of bits in the bitstream is                                      
                                                              ilding,
       3 terabytes * 8 bits * 6 drives = 144 terabits

    BER is the Bit Error Rate of the disk. Manufacturers typically
    express this as 1 error in so many bits, after error correction
    has been attempted.

    For a typical consumer drive, the BER is stated to be no more
    than 1 in 10E14 bits (eg, Western Digital Green). With a 3TB
    drive (drive sizes are specified in powers of ten) yielding a
    likelihood of a bit error in a whole-array transfer of

                144 terabits
        p1 =  ---------------  =  0.24 (24%)
                10^14 bits

    which means 1 / 0.24 = 4.166 whole array transfers between
    expected bit errors. (A Monte Carlo simulation with 500 trials
    gave a figure of 16.3%, a bit more optimistic).

    +-------------------------+
    | Hs 2  3  4  5  P  Q  1a |     3 is a partially faulted element
    +-------------------------+     1a is being reconstructed
         |  .  |  |  |  |  ^        from 2,3,4,5,P,Q.
         |  .  |  |  |  R->^
         |  .  |  |  R-->--^
         |  .  |  R-->-----^
         |  .  R-->--------^
         |  . . . . . . . .
         R-->--------------+


t5  The RAID controller removes element 3 from the array,
    continues reconstructing the missing element on 1a,
    and schedules Hs to be reconstructed once 1a is restored.
    Array integrity: Partial, RAID has lost one element and is
    rebuilding one element. Array can't sustain loss of any element.
    +-------------------------+
    | Hs 2  F  4  5  P  Q  1a |     F is faulted.
    +-------------------------+     1a is being reconstructed
         |     |  |  |  |  ^        from 2,4,5,P,Q.
         |     |  |  |  R->^        Hs will be reconstructed next.
         |     |  |  R-->--^
         |     |  R-->-----^
         |     R-->--------^
         |                 ^
         R-->--------------+

............................................................
.  Here's the scenario with a Happy outcome. Everything    .
.  works as planned and the sysadmin is home for dinner.   .
............................................................

t6  (Happy outcome)
    The sysadmin replaces failed drive (element 3) with a new drive.
    The RAID controller marks it ready for reconstruction but won't
    start rebuilding it until the rebuild underway is complete.
    Array integrity: Partial, RAID has lost one element and is
    rebuilding one element. Array can't sustain loss of any element.
    +-------------------------+
    | Hs 2  3a 4  5  P  Q  1a |     1a is being reconstructed.
    +-------------------------+     Hs will be reconstructed next.
         |     |  |  |  |  ^        3a will be reconstructed after Hs.
         |     |  |  |  R->^
         |     |  |  R-->--^
         |     |  R-->-----^
         |     R-->--------^
         |                 ^
         R-->--------------+

    **************************************************************
    * The sysadmin saddles the unicorn and rides off. Well done! *
    **************************************************************


t7  (Happy outcome)
    The rebuild of element 1a finishes at t1 + 5.75 hours.
    The controller starts rebuilding element 3a from the
    active elements 2,4,5,P,Q,1a.
    144 terabits are transferred with no detected errors.
    Array integrity: OK. Array can sustain loss of one element.
    +-------------------------+
    | Hs 2  3a 4  5  P  Q  1a |     3a is being reconstructed.
    +-------------------------+     2,4,5,P,Q,1a are active.
         |  ^  |  |  |  |  |        Hs will be reconstructed next.
         |  ^<-R  |  |  |  |
         |  ^--<--R  |  |  |
         |  ^-----<--R  |  |
         |  ^--------<--R  |
         |  ^              |
         R->+-----------<--R


t8 (Happy outcome)                                                              
                                
    At t7 + 5.75 hours the rebuild of element 3a is done.
    7 active elements are up, plus one available hot spare.
    Array integrity: OK. Array can sustain loss of two elements.
    +-------------------------+
    | 1  2  3  4  5  P  Q  Hs |     Elements 1..5 are data, P,Q are parity,
    +-------------------------+     Hs is Hot standby.

.............................................................
.  Now we'll work through a Sad scenario. The rebuilding    .
.  hits a slight (1 bit) snag. Cold pudding for dinner.     .
.............................................................


t6 (Sad outcome)
   The operator replaces failed drive (element 3) with a new drive.
   The RAID controller marks it ready for reconstruction but won't
   start rebuilding it until the current rebuild is finished.
   Array integrity: Partial, RAID has lost one element and is
   rebuilding one element, and will rebuild a 2nd element soon.
   Array can't sustain loss of an element.
   +-------------------------+
   | Hs 2  3a 4  5  P  Q  1a |     1a is being reconstructed.
   +-------------------------+     3a is marked to be reconstructed.
        |     |  |  |  |  ^
        |     |  |  |  R->^
        |     |  |  R-->--^
        |     |  R-->-----^
        |     R-->--------^
        |                 ^
        R-->--------------+


t7 (Sad outcome)
   During the rebuild of 1a, a *second* drive (element 4) reports
   an uncorrectable single-bit error to the RAID controller.
   The RAID controller removes element 4 from the array,
   The array is failed and goes offline. Probably with some
   effort all data except one stripe can be reconstructed
   with the faulted element formerly known as 3, but it's
   complicated and a bit risky and unusual and takes time.
 
   How likely is this? Using a Monte Carlo simulation assuming bit
   errors on multiple drives are uncorrelated, and normally
   distributed in the bitstream, running 500 trials the results at
   the 95% confidence level: 36.7 to 66.2 (mean 47.2) rebuilds
   between hitting 2 bit errors from 2 different drives.

   +-------------------------+
   | Hs 2  3a F  5  P  Q  1a |     F is faulted.
   +-------------------------+     Hs is marked for rebuilding.
        |        |  |  |  ^        3a is new and marked for rebuilding.
        |        |  |  R->^
        |        |  R-->--^
        |        R-->-----^
        |                 ^
        |                 ^
        R-->--------------+


t9 (Sad outcome)
   Restore the array from backups. Up to 28.75 hours is
   needed to write 3TB x 5 data elements to a RAID6, so maybe half that
   is actually needed because the array isn't full, right?

-- 
Charles

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Re: [lopsa-discuss] Math for SysAdmins - Conference tutorial?

Reply via email to