Wil Reichert <wil.reich...@gmail.com> posted 7a329d910906130944o7e5fa20eta701105f0e986...@mail.gmail.com, excerpted below, on Sat, 13 Jun 2009 09:44:32 -0700:
> I think one of my drives is on its way out, tho I've never seen a drive > fail like this before. Drive is a year old WD 640G & I use it as my > system drive. Via SMART, I've been doing daily short & weekly long tests > since I installed it. Starting last week I woke up to my keyboard > lights blinking and the sound of the heads thrashing & the drive > repeatedly attempting to spin up. On my desktop the mouse was still > moving but any command (dmesg, less /var/log/messages) resulted in an IO > error. I restarted the computer and everything came up fine. > I dug through the logs but there were no IO errors of any sort to be > found. All I could see was that the extended SMART test successfully > started (from smartd.log): You list the drive make, but I don't know if that's the model or not. Googling turns of a number of 640 gig Western Digital models... FWIW, while I'm having better luck again with my current Seagates (3 years old this summer, 4 300 gig SATA drives with most of the system in RAID-6 so I could lose one... and still be able to have a second go down while I was rebuilding on a replacement, without losing the system), I had a bad run of two drives in a row that lasted almost exactly a year, before that. Before /that/, I'd always run my drives past switching them primary to secondary due to upgrade, then secondary to third drive, then eventually out of rotation as too small to be practical any more or when I had no room on the bus or when they failed as a third drive, so two drives in a row going out in a year was BAD for me. OTOH, at least one of them SEVERELY overheated (AC went dead and I came home to the 'puter still trying to run in a room of ~50C, no telling what the drive was), and I'm reasonably sure it'd have run much longer otherwise. BTW, both of those drives (including the way overheated one, which simply head-crashed, thus grooved up where it the head was floating at the time, but was OK on other partitions including my backup partitions on the same disk) still actually ran when I pulled them, but they had bad partitions and I no longer felt safe running them. It's possible something like that is happening to your disk too, particularly if SMART says it has overheated. Meanwhile, as I said, I don't know what your drive is, but PARTICULARLY IF IT IS IDE, take a look at this recent LWN article, in particular, the HPA aka host protected area bit, and the comment of "alankila" (near the bottom), which your story brought to mind. It might be worth checking with hdparm just to be sure, tho I really don't understand how smart's own test could be screwed up by that as the drive should certainly understand its own parameters even if various Linux utilities don't necessarily agree. (This is the "In Brief" feature from the June 3 LWN kernel page. As such, it covers a number of topics "in brief", so it doesn't give much info, but that comment's useful and it's a good place to start further research if it looks useful.) http://lwn.net/Articles/335913/ But regardless, getting another drive and RAID-1-ing the pair (or four drives and RAID-6-ing or RAID-10-ing them), at least for your vital partitions, is I believe a pretty good idea at this point. It seems drives don't last like they used to, and they are cheap enough, RAIDing them is actually a reasonable solution now, especially with SATA. I know I rest a LOT easier knowing I have 2-drive redundancy, here. Or you can do what I did before, which appears to be what you had done, rotate your primary drive into backup usage, and hope both the older backup and the newer main drive don't go out at the same time. Of course, you can then be left without good backups if that's all you use, since the one's likely much smaller than the other, which used to mean probably too small for all your data on both, tho with today's capacities and cost for new drives, that's not quite the problem it used to be. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman