> On Apr 19, 2013, at 2:50, Fred Youhanaie <[email protected]> wrote: >> On 19/04/13 00:01, mathog wrote: >>> High end SATA and SAS disks claim MTBF values that work out to over >>> 100 >>> years, and yet it is a common >>> observation that certain models fail at rates entirely inconsistent >>> with those values. For instance, >>> 75% of all drives of one model dead in < 6 years. (Cited by one >>> poster >>> in this thread: >> >> You may find this paper helpful, some of the data sets used in their >> studies come from large HPC sites: >> >> Bianca Schroeder, Garth A. Gibson >> Understanding disk failure rates: What does an MTTF of 1,000,000 >> hours mean to you? >> http://dl.acm.org/citation.cfm?doid=1288783.1288785 >> >> If you, or your institution, do not have access to the ACM >> publications, you may be able to find a free copy posted by the >> authors, ACM does allow that :)
Very good reference. This is the second conclusion from that paper: For drives less than five years old, field replacement rates were larger than what the datasheet MTTF suggested by a factor of 2–10. For five to eight-year old drives, field replacement rates were a factor of 30 higher than what the datasheet MTTF suggested. The paper discussed in some detail one key factor in this discrepancy - the end user's definition of "failed" usually differs substantially from the vendor's. For instance, I replace disks when they are either accumulating swapped out sectors rapidly (write failures) or accumulate more than a few pending errors (read failures). The former indicate that the disk is going south, but no data is lost, and they are not in themselves disruptive, the latter are disruptive since data is potentially lost on each such event, and in any case, these events must be cleared manually. The vendors most likely would consider neither of these a failure event since SMART will still read PASSED on such drives. My overall impression is that, when buying drives, the single piece of manufacturer provided data that best correlates with the actual expected life of the drive is the length of the warranty. Even that is little protection against a bad batch though. Thanks, David Mathog [email protected] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
