On 4/20/13 3:08 PM, "Andrew Holway" <[email protected]> wrote:
>Did anyone post this yet? I thinking this is one of the definitive >works on disk failure. > >http://static.googleusercontent.com/external_content/untrusted_dlcp/resear >ch.google.com/en//archive/disk_failures.pdf > >On 19 April 2013 17:56, Joe Landman <[email protected]> >wrote: >> On 4/19/2013 11:47 AM, mathog wrote: >>>> My overall impression is that, when buying drives, the single piece of >>>> manufacturer provided data that >>>> best correlates with the actual expected life of the drive is the >>>> length of the warranty. Even that is little >>>> protection against a bad batch though. >> >> Use AFR and warranty, ignore everything else. MTBF does not correlate >> at all against AFR, and AFR is an objective measure. Some salient points from that article: "The higher baseline AFR for 3 and 4 year old drives is more strongly influenced by the underlying reliability of the particular models in that vintage than by disk drive aging effects." "For example, Figure 2 changes significantly when we normalize failure rates per each drive model. Most age-related results are impacted by drive vintages. However, in this paper, we do not show a breakdown of drives per manufacturer, model, or vintage due to the proprietary nature of these data." Yep.. Detailed failure stats are hard to come by, because they're valuable. And of course, the interesting thing in that paper was that failure rates are higher for the colder drives... It might a "tolerance" issue.. The drives are optimized to work at a particular temperature (e.g. 40C) and that's where all the stackup of tolerances (mechanical and timing)works best. As you get away from that temperature, deviations from nominal (due to aging or wear) are more likely to cause a failure. There's also this: "Yang and Sun [21] and Cole [4] describe the processes and experimental setup used by Quantum and Seagate to test new units and the models that attempt to make long-term reliability predictions based on accel- erated life tests of small populations. Power-on-hours, duty cycle, temperature are identified as the key deployment parameters that impact failure rates, each of them having the potential to double failure rates when going from nominal to extreme values. " This is quite interesting, because they see only a doubling in going to extreme values. Clearly, this is not dominated by an Arrhenius double per 10 degrees kind of effect. Of course, that's contradicted by the very next sentence: "For example, Cole presents thermal de-rating models showing that MTBF could degrade by as much as 50% when going from operating temperatures of 30C to 40C." The net of all this is that (and I'll bet you if you read all 21 of the references, you'll find this).. Disk drive life time is very hard to predict. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
