From: "Peter St. John" <[email protected]<mailto:[email protected]>>
Date: Monday, April 22, 2013 6:19 PM
To: mathog <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Beowulf] Are disk MTBF ratings at all useful?

Human mortality has, broadly, a Poisson, and a non-Poisson, component. The 
chance of getting hit by a meteor is Poisson, it has nothing to do with your 
age; but the chance of a 99 year old living to 100 is lower than the chance of 
a 20 year old living to 21, because we wear out, that's not Poisson. (Dogs are 
a clearer example: the chance of getting hit by a car is Poisson, but dying of 
old age after a dozen years or so is not.)

We usually think of incandescent light bulbs as Poisson; the chance of, I don't 
know, Brownian Motion, clipping a very narrow filament, is bigger than the 
degradation of mere use; except in the case of switching the bulb off and on 
frequently, when the chance of failure depends more on fatigue as the filament 
expands and contracts.

Hard Disks are somewhat Poisson, and somewhat not. More so, I think, than 
humans.

----

What you are describing  is the standard bathtub curve, where the failure rate 
is constant on the "bottom" of the bathtub.  Infant mortality isn't an issue 
any more, and old age/wearout hasn't started.

I would say that the real question is "where is the far side of the bathtub" 
where the rate starts climbing steeply.  That's the important number, and one 
that is NOT necessarily the MTBF.  I suspect the "calculated" MTBF in a system 
without any big wearout mechanisms would be essentially the inverse of the 
failure rate in the flat part of the curve.  However, electromechanical devices 
DO have wear-out mechanisms, and they likely have shorter life that the 
electronics.

Furthermore, the wear life might some complicated thing like "integrated head 
motion" with some very complicated power laws. As an example of a seemingly 
simple component with a complex life phenomenon, take capacitors used for 
pulsed power systems..  They typically have a life that goes something like

Lx = Lref * (Qref/Qx)^1.6 * (Vref/Vx)^7.5

The wearout mechanism has to do with internal mechanical stresses.   So, 
increasing the Q of the circuit increases the amount of voltage reversal as the 
exponentially damped sine wave rings down.  And voltage has a very strong 
effect on life, because it is directly related to the mechanical loads, as well 
as the electrical stress on the dielectric.

Another common device with a not entirely intuitive life characteristic is an 
incandescent light bulb.  Life goes as (variously) the 12th to 16th power of 
voltage (higher voltage = shorter life), while light output goes as the 3.4 
power of voltage. So you could have a usage pattern that seems equivalent in 
terms of operating hours, or total lumen-seconds produced, and have very 
different life.

The same is no doubt true of disk drives.  While the google folks didn't find 
any big obvious patterns (other than failure rates increasing at low and high 
temps), they also commented that their sample was non-homogenous, so you could 
be looking at the equivalent of 100 Volt, 120Volt and 130Volt lightbulbs all 
running off the same 115V circuit.





_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to