Wil Reichert <wil.reich...@gmail.com> posted
7a329d910906130944o7e5fa20eta701105f0e986...@mail.gmail.com, excerpted
below, on  Sat, 13 Jun 2009 09:44:32 -0700:

> I think one of my drives is on its way out, tho I've never seen a drive
> fail like this before.  Drive is a year old WD 640G & I use it as my
> system drive. Via SMART, I've been doing daily short & weekly long tests
> since I installed it.  Starting last week I woke up to my keyboard
> lights blinking and the sound of the heads thrashing & the drive
> repeatedly attempting to spin up.  On my desktop the mouse was still
> moving but any command (dmesg, less /var/log/messages) resulted in an IO
> error.  I restarted the computer and everything came up fine.
>  I dug through the logs but there were no IO errors of any sort to be
> found.  All I could see was that the extended SMART test successfully
> started (from smartd.log):

You list the drive make, but I don't know if that's the model or not.  
Googling turns of a number of 640 gig Western Digital models...

FWIW, while I'm having better luck again with my current Seagates (3 
years old this summer, 4 300 gig SATA drives with most of the system in 
RAID-6 so I could lose one... and still be able to have a second go down 
while I was rebuilding on a replacement, without losing the system), I 
had a bad run of two drives in a row that lasted almost exactly a year, 
before that.  Before /that/, I'd always run my drives past switching them 
primary to secondary due to upgrade, then secondary to third drive, then 
eventually out of rotation as too small to be practical any more or when 
I had no room on the bus or when they failed as a third drive, so two 
drives in a row going out in a year was BAD for me.  OTOH, at least one 
of them SEVERELY overheated (AC went dead and I came home to the 'puter 
still trying to run in a room of ~50C, no telling what the drive was), 
and I'm reasonably sure it'd have run much longer otherwise.

BTW, both of those drives (including the way overheated one, which simply 
head-crashed, thus grooved up where it the head was floating at the time, 
but was OK on other partitions including my backup partitions on the same 
disk) still actually ran when I pulled them, but they had bad partitions 
and I no longer felt safe running them.  It's possible something like 
that is happening to your disk too, particularly if SMART says it has 
overheated.

Meanwhile, as I said, I don't know what your drive is, but PARTICULARLY 
IF IT IS IDE, take a look at this recent LWN article, in particular, the 
HPA aka host protected area bit, and the comment of "alankila" (near the 
bottom), which your story brought to mind.  It might be worth checking 
with hdparm just to be sure, tho I really don't understand how smart's 
own test could be screwed up by that as the drive should certainly 
understand its own parameters even if various Linux utilities don't 
necessarily agree.

(This is the "In Brief" feature from the June 3 LWN kernel page.  As 
such, it covers a number of topics "in brief", so it doesn't give much 
info, but that comment's useful and it's a good place to start further 
research if it looks useful.)  http://lwn.net/Articles/335913/

But regardless, getting another drive and RAID-1-ing the pair (or four 
drives and RAID-6-ing or RAID-10-ing them), at least for your vital 
partitions, is I believe a pretty good idea at this point.  It seems 
drives don't last like they used to, and they are cheap enough, RAIDing 
them is actually a reasonable solution now, especially with SATA.  I know 
I rest a LOT easier knowing I have 2-drive redundancy, here.

Or you can do what I did before, which appears to be what you had done, 
rotate your primary drive into backup usage, and hope both the older 
backup and the newer main drive don't go out at the same time.  Of 
course, you can then be left without good backups if that's all you use, 
since the one's likely much smaller than the other, which used to mean 
probably too small for all your data on both, tho with today's capacities 
and cost for new drives, that's not quite the problem it used to be.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


Reply via email to