On Tue, 2011-06-07 at 09:02 -0400, Miles Fidelman wrote: > Ralf Mardorf wrote: > > For me a hard disc never gets broken without click-click-click noise > > before it failed, but it's very common that cables and connections fail. > > > > > > By the time a disk gets to the click-click-click phase,
A phase everybody know for modern HDDs :D, but it's possible to get data even from a disk that won't loose the heads anymore [1]. For the Atari I've got a 42MB SCSI connected to a Lacom adaptor, it sometimes needs several boots, but it's unbreakable. > there has been > LOTS of warning - it's just that today's disks include lots of internal > fault-recovery mechanisms that hide things from you, unless you run > SMART diagnostics (and not just the basic "smart status" either). > > For example, if you have a machine that's suddenly running VERY slowly Correct! Resp. if Voodoo seems to have impact to your machine, it seldom is Voodoo, but a broken HDD. > - > it's good sign that a drive is experiencing internal read errors (unless > it's a laptop - a shorted battery is a good suspect). Both are lessons > learned the hard way, and not forgotten. > > Turns out that modern drives have onboard processors that retry reads > multiple times - good for protecting data if you only have the one copy > on that drive, at the expense of reduced disk access times. Not so good if: > > a. you don't notice that it's happening (the disk will eventually fail > hard), or, > > b. you're running RAID - instead of the drive dropping out of the array, > the entire array slows down as it waits for the failing drive to > (eventually) respond > > In either case, you'll tear your hair out trying to figure out why your > machine is running slowly (is it a virus, a file lock that didn't > release, etc., etc., etc.). > > Lessons learned: > > - if your machine is running really slowly, try a reboot -- if it > reboots properly, but takes 2 times as long (or longer) to shutdown and > then come back up -- get very suspicious (if your patience lasts that long) > > - if it's a laptop - pull the battery and try again - if everything is > normal, buy yourself a new battery > > - if it's a server - try booting from a liveCD (if you can, first > disconnect the hard drive entirely) - if normal then you could well have > a hard drive problem (or you could have a virus) > > - install SMART utilities and run "smartctl -A /dev/<your drive> -- the > first line is usually the "raw read error" rate -- if the value (last > entry on the line) is anything except 0, that's the sign that your drive > is failing, if it's in the 1000s, failure is imminent, it's just that > your drive's internal software is hiding it from you - replace it! > > - if you're running RAID, be sure to purchase "enterprise" drives (where > "desktop" try very hard to read a sector, despite the delay; enterprise > drives give up quickly as they expect failure recovery to be handled by > RAID) > > - you would expect software raid (md) to detect slow drives, mark them > bad, and drop them from an array -- nope, md does not keep track of delay > > and, not really relevant for Debian, but a direct offshoot of learning > the above lessons: > > - if you're running a Mac or Windows, you're system may be reporting > "smart status good" - but it's not really true - it's not looking at raw > read errors > > - there seems to be a bug in the smart utilities for Mac (as available > through Macports and Fink) -- the smart daemon will fail periodically, > with the only symptom being that every few minutes, you're machine will > slow to a crawl (spinning beachball everywhere) for 30 seconds or so, > then recover --- a really good example of taking a pre-emptive measure > that causes a new problem (I can't tell you how long it took to track > this one down - what with downloading every performance tracking tool I > could find.) > > > Miles Fidelman > > -- > In theory, there is no difference between theory and practice. > In<fnord> practice, there is. .... Yogi Berra My Samsung SATA drives until now are without failure for a suspicious long time :). I very, very often turn the computer off and on. The only bad are the SATA connectors, a friend already planned to solder new SATA connectors on his mobo. Note! Nobody without experiences in soldering multi-layer boards should do this soldering. I planned to do it too. [1] When the heads aren't released anymore after the final click, there still is the possibility to get them working. - Disassemble the HDD from the case, keep the power and data cables connected. - With a rubber-headed mallet or something similar knock against the HDD from several angles, while rebooting again and again. - If it doesn't work, repeat this after the HDD did rest for a week. Dunno while this does help, but it does, perhaps different temperatures for the room will work like gnomes. -- Ralf -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1307453128.2408.19.camel@debian