> At work I've got a server with an LSI MegaRAID (dmesg below) that 
> suddenly seems to be killing hard drives.  Last Thursday I had one drive 
> fail, and the system didn't begin rebuilding onto the hot spare until I 
> rebooted.

I would hope that the controller isn't killing drives.

Can we presume the system has clean power, temps are ok, no vibration, etc. ?

> Hitachi's drive testing tool seems to be windows only, so are there any 
> drive checking utilities that can check an individual drive when it's a 
> part of a RAID1?  Or is it safe to assume that if the drive fails in the 
> RAID it is really dead.  I'm trying to make sure I'm not seeing some 
> kind of problem with the enclosure or the megaraid card before I start 
> shipping drives back to Hitachi.

Can you get the SMART data from the drives?  Interpreting SMART data
is another problem, but maybe you can find a clue there.

Is it possible that the drives just took "too long" to read or write and
the RAID marked them bad?  Maybe remapping a bad sector takes too long...

Maybe hook them to a different controller (no RAID) and do a simple test
with dd over the entire drive, something like

dd if=/dev/suspect_disk of=/dev/null bs=1m
dd if=/dev/zero of=/dev/suspect_disk bs=1m

and see if you get any errors from dd or in dmesg.

Reply via email to