On 03/05/2020 22:46, Caveman Al Toraboran wrote:
On Sunday, May 3, 2020 6:27 PM, Jack <ostrof...@users.sourceforge.net> wrote:


curious.  how do people look at --layout=n2 in the
storage industry?  e.g. do they ignore the
optimistic case where 2 disk failures can be
recovered, and only assume that it protects for 1
disk failure?

You CANNOT afford to be optimistic ... Murphy's law says you will lose the wrong second disk.

i see why gambling is not worth it here, but at
the same time, i see no reason to ignore reality
(that a 2 disk failure can be saved).

Don't ignore that some 2-disk failures CAN'T be saved ...

e.g. a 4-disk RAID10 with -layout=n2 gives

         1*4/10 + 2*4/10 = 1.2

expected recoverable disk failures.  details are
below:


now, if we do a 5-disk --layout=n2, we get:

     1    (1)    2    (2)    3
    (3)    4    (4)    5    (5)
     6    (6)    7    (7)    8
    (8)    9    (9)    10   (10)
     11   (11)   12   (12)   13
    (13) ...

obviously, there are 5 possible ways a single disk
may fail, out of which all of the 5 will be
recovered.

Don't forget a 4+spare layout, which *should* survive a 2-disk failure.

there are nchoosek(5,2) = 10 possible ways a 2
disk failure could happen, out of which 5
will be recovered:


so, by transforming a 4-disk RAID10 into a 5-disk
one, we increase total storage capacity by a 0.5
disk's worth of storage, while losing the ability
to recover 0.2 disks.

but if we extended the 4-disk RAID10 into a
6-disk --layout=n2, we will have:

              6                  nchoosek(6,2) - 3
= 1 * -----------------  +  2 * -----------------
       6 + nchoosek(6,2)         6 + nchoosek(6,2)

= 6/21                   +  2 * 12/15

= 1.8857 expected recoverable failing disks.

almost 2.  i.e. there is 80% chance of surviving a
2 disk failure.

so, i wonder, is it a bad decision to go with an
even number disks with a RAID10?  what is the
right way to think to find an answer to this
question?

i guess the ultimate answer needs knowledge of
these:

     * F1: probability of having 1 disks fail within
           the repair window.
     * F2: probability of having 2 disks fail within
           the repair window.
     * F3: probability of having 3 disks fail within
       .   the repair window.
       .
       .
     * Fn: probability of having n disks fail within
           the repair window.

     * R1: probability of surviving 1 disks failure.
           equals 1 with all related cases.
     * R2: probability of surviving 2 disks failure.
           equals 1/3 with 5-disk RAID10
           equals 0.8 with a 6-disk RAID10.
     * R3: probability of surviving 3 disks failure.
           equals 0 with all related cases.
       .
       .
       .
     * Rn: probability of surviving n disks failure.
           equals 0 with all related cases.

     * L : expected cost of losing data on an array.
     * D : price of a disk.

Don't forget, if you have a spare disk, the repair window is the length of time it takes to fail-over ...

this way, the absolute expected cost when adopting
a 6-disk RAID10 is:

= 6D + F1*(1-R1)*L + F2*(1-R2)*L + F3*(1-R3)*L + ...
= 6D + F1*(1-1)*L + F2*(1-0.8)*L + F3*(1-0)*L + ...
= 6D + 0          + F2*(0.2)*L   + F3*(1-0)*L + ...

and the absolute cost for a 5-disk RAID10 is:

= 5D + F1*(1-1)*L + F2*(1-0.3333)*L + F3*(1-0)*L + ...
= 5D + 0          + F2*(0.6667)*L   + F3*(1-0)*L + ...

canceling identical terms, the difference cost is:

6-disk ===> 6D + 0.2*F2*L
5-disk ===> 5D + 0.6667*F2*L

from here [1] we know that a 1TB disk costs
$35.85, so:

6-disk ===> 6*35.85 + 0.2*F2*L
5-disk ===> 5*35.85 + 0.6667*F2*L

now, at which point is a 5-disk array a better
economical decision than a 6-disk one?  for
simplicity, let LOL = F2*L:

5*35.85 + 0.6667 * LOL  <   6*35.85 + 0.2 * LOL
0.6667*LOL - 0.2 * LOL  <   6*35.85 - 5*35.85
LOL * (0.6667 - 0.2)    <   6*35.85 - 5*35.85

                             6*35.85 - 5*35.85
            LOL          <   -----------------
                               0.6667 - 0.2

            LOL          <   76.816
            F2*L         <   76.816

so, a 5-disk RAID10 is better than a 6-disk RAID10
only if:

         F2*L  <  76.816 bucks.

this site [2] says that 76% of seagate disks fail
per year (:D).  and since disks fail independent
of each other mostly, then, the probabilty of
having 2 disks fail in a year is:

76% seems incredibly high. And no, disks do not fail independently of each other. If you buy a bunch of identical disks, at the same time, and stick them all in the same raid array, the chances of them all wearing out at the same time are rather higher than random chance would suggest.

Which is why, if a raid disk fails, the advice is always to replace it asap. And if possible, to recover the failed drive to try and copy that rather than hammer the rest of the raid.

Bear in mind that, it doesn't matter how many drives a raid-10 has, if you're recovering on to a new drive, the data is stored on just two of the other drives. So the chances of them failing as they get hammered are a lot higher.

That's why it makes a lot of sense to make sure you monitor the SMARTs, so you can replace any of the drives that look like failing before they actually do. And check the warranties. Expensive raid drives probably have longer warranties, so when they're out of warranty consider retiring them (they'll probably last a lot longer, but it's a judgement call).

All that said, I've been running a raid-1 mirror for a good few years, and I've not had any trouble on my Barracudas.

Cheers,
Wol

Reply via email to