Google Says Diagnostics Don't Catch Many PC Drive Failures 

Feb 22, 2007 

Google Says Diagnostics Don't Catch Many PC Drive Failures 

Built-in disk drive diagnostics only predict about half the drive failures that 
occur, Google said after studying thousands of drives. 

Chris Mellor, Techworld 

Wednesday, February 21, 2007 01:00 PM PST 

Google research has shown that built-in disk drive diagnostics only predict 
about half the drive failures that occur. 

Modern disk drives have a built-in self-test and diagnostic facility termed 
Self-Monitoring, Analysis and Reporting Technology--SMART. The drive firmware
monitors a range of drive parameters, things like the number of seek errors and 
the disk spin-up time. If these parameters degrade over time it may indicate
the unit is heading for a breakdown. With advance warning of an impending disk 
failure you will have a chance to move files and/or replace the unit before
you lose any data. 

Google's study looked at more than one hundred thousand disk drives which were 
a combination of serial and parallel ATA consumer-grade hard disk drives,
ranging in speed from 5400 to 7200 rpm, and in size from 80 to 400 GB. The 
observed range of annualized failure rates varied from 1.7 percent, for drives
that were in their first year of operation, to over 8.6 percent, observed in 
their third year. 

The paper (in PDF format) is called 
Failure Trends in a Large Disk Drive Population . 

Diagnostic Tools Uneven 

The Google researchers found that SMART diagnostics are not as useful as they 
are supposed to be. They note that there is little independent research into
drive life and diagnostics, stating 'Most of the available information comes 
from the disk manufacturers themselves. Their data are typically based on
extrapolation from accelerated life test data of small populations or from 
returned unit databases.' 

They note 'detailed studies of very large populations (of hard drives) are the 
only way to collect enough failure statistics to enable meaningful conclusions.
In this paper we present one such study by examining the population of hard 
drives under deployment within Google's computing infrastructure.' Google has
'built an infrastructure that collects vital information about all Google's 
systems every few minutes, and a repository that stores these data in 
time-series
format (essentially forever) for further analysis.' 

The researchers mined this data and analyzed it looking for correlations 
between hard drive sensor and SMART readings and failure events. Their findings
were: 

-- Very little correlation between failure rates and either raised temperature 
or activity levels. 

-- Some SMART parameters (scan errors, reallocation counts, offline 
reallocation counts, and probational counts) have a large impact on failure 
probability.
Others do not. Out of all failed drives, over 56 percent of them had no count 
in any of these four strong SMART signals. 

-- There was a lack of failure-predicting SMART signals on a large proportion 
of failed drives. 

-- Taking all SMART signals and temperature readings into account they found 
about 36 percent of all failed drives had no predictive failure signals at
all. 

Their conclusion was that 'it is unlikely that an accurate predictive failure 
model can be built based on these signals alone." Further "models based on
SMART parameters alone are unlikely to be useful for predicting individual 
drive failures." 

Google's researchers hope that predictive models that 'use parameters beyond 
those provided by SMART could achieve significantly better accuracies. For
example, performance anomalies and other application or operating system 
signals could be useful in conjunction with SMART data to create more powerful
models.' 

Google uses millions of drives so its findings should be taken seriously by the 
hard drive industry, also by customers implementing disk-to-disk backup
systems who need to have better disk failure protection built into their D2D 
systems--meaning stronger RAID schemes, such as RAID 6 or DP, and more spare
drives. 

http://www.pcworld.com/article/id,129238-pg,1/article.html

Vikas Kapoor,
MSN ID:
[EMAIL PROTECTED]
Yahoo ID:
[EMAIL PROTECTED]
Skype ID: dl_vikas
Mobile: (+91) 9891098137.
To unsubscribe send a message to [EMAIL PROTECTED] with the subject unsubscribe.

To change your subscription to digest mode or make any other changes, please 
visit the list home page at
  http://accessindia.org.in/mailman/listinfo/accessindia_accessindia.org.in

Reply via email to