Folks,

Over the past year, I have replaced something around 20 IDE
Harddrives in 5 different computers running Debian because of drive
faults. I know about IDE and that it's "consumer quality" and no
more, but it can't be the case that the failure rate is that high.

The drives are mostly made by IBM/Hitachi, and they run 24/7, as the
machines in question are either routers, firewalls, or servers.

Replacing a drive would be a result of symptoms, such as frequent
segmentation faults, corrupt files, and zombie processes. In all
cases, I replaced the drive, transferred the data (mostly without
problems), got the machine back into a running state, then ran
`badblocks -svw` on the disk. And usually, I'd see a number of bad
blocks, usually in excess of 100.

The other day, I received a replacement drive from Hitachi, plugged
it into a test machine, ran badblocks and verified that there were
no badblocks. I then put the machine into a firewall, sync'd the
data (ext3 filesystems) and was ready to let the computers be and
head off to the lake... when the new firewall kept reporting bad
reloc headers in libraries, APT would stop working, there would be
random single-letter flips in /var/lib/dpkg/available (e.g. swig's
Version field would be labelled "Verrion"), and the system kept
reporting segfaults. I consequently plugged the drive into another
test machine and ran badblocks -- and it found more than 2000 -- on
a drive that had non the day before.

Just now, I got another replacement from Hitachi (this time it
wasn't a "serviceable used part", but a new drive), and out of the
box, it featured 250 bad blocks.

My vendor says that bad blocks are normal, and that I should be
running the IBM drive fitness test on the drives to verify their
functionality. Moreover, he says that there are tools to remap bad
blocks.

My understanding was that EIDE does automatic bad sector remapping,
and if badblocks actually finds a bad block, then the drive is
declared dead. Is this not the case?

The reason I am posting this is because I need mental support. I'm
going slightly mad. I seem to be unable to buy non-bad IDE drives,
be they IBM, Maxtor, or Quantum. Thus I spend excessive time on
replacing drives and keeping systems up by brute-force. And when
I look around, there are thousands of consumer machines that run
day-in-day-out without problems.

It may well be that Windoze has better error handling when the
harddrive's reliability degrades (I don't want to say this is a good
thing). It may be that IDE hates me. I don't think it's my IDE
controller, since there are 5 different machines involved, and the
chance that all IDE controllers report bad blocks where there aren't
any, but otherwise function fine with respect to detecting the
drives (and not reporting the dreaded dma:intr errors).

So I call to you and would like to know a couple of things:

  - does anyone else experience this?
  - does anyone know why this is happening?
  - is it true that bad blocks are normal and can be handled
    properly?
  - why is this happening to me?
  - can bad blocks arise from static discharge or impurities? when
    i replace disks, I usually put the new one into the case
    loosely and leave the cover open. The disk is not subjected to
    any shocks or the like, it's sitting still as a rock, it's just
    not affixed.

I will probably never buy IDE again. But before I bash companies
like Hitachi for crap quality control, I would like to make sure
that I am not the one screwing up.

Any comments?

-- 
Please do not CC me when replying to lists; I read them!
 
 .''`.     martin f. krafft <[EMAIL PROTECTED]>
: :'  :    proud Debian developer, admin, and user
`. `'`
  `-  Debian - when you have better things to do than fixing a system
 
Invalid PGP subkeys? Use subkeys.pgp.net as keyserver!

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to