> Hi,
> 
> On 2/05/2013 12:06 AM, James Harper wrote:
> > I have just had two drives failed in a server today. One is mostly part of a
> RAID0 set (which is in turn part of a DRBD, so we're still good) and a small
> partition that is part of a RAID1, which hasn't been failed (errors are about
> 1.3TB along a 2TB disk). The other is one I was testing, it wasn't 
> particularly
> new and doesn't really matter.
> >
> > Both drives have logged read errors under Linux kernel, both report drive is
> healthy status (SMART overall-health self-assessment test result: PASSED),
> and both say "Completed: read failure" almost immediately when I do a
> SMART self test (short test or long).
> >
> > I don't really have any trouble with the fact that two drives have failed, 
> > but
> I'm really surprised that SMART still reports that the drive is good when it 
> is
> clearly not... what's with that?
> 
> This from Google:
> 
> "Our analysis identifies several parameters from the drive's
> self monitoring facility (SMART) that correlate highly with
> failures. Despite this high correlation, we conclude that mod-
> els based on SMART parameters alone are unlikely to be useful
> for predicting individual drive failures. Surprisingly, we found
> that temperature and activity levels were much less correlated
> with drive failures than previously reported."
> 
> In a nutshell, SMART is not a good indicator of pending failure ... use
> it as an indication only, but certainly don't count on it.  But really,
> SMART is next to useless overall, so it isn't even much of a "real"
> indicator.... YMMV.
> 

It's frustrating because a simple "if hard read errors > 0 || failed self tests 
> 0 then drive = not okay" would have meant I could just read the SMART health 
indicator and eject the drive from the array (or whatever it belonged to).

James

_______________________________________________
luv-main mailing list
[email protected]
http://lists.luv.asn.au/listinfo/luv-main

Reply via email to