On Thu, Jun 5, 2014, at 05:24 PM, STeve Andre' wrote:
> On 06/05/14 17:38, Christian Weisgerber wrote:
> > I have a 3TB disk here...
> >
> > sd1 at scsibus1 targ 1 lun 0: <ATA, Hitachi HUA72303, MKAO> SCSI3 0/direct 
> > fixed naa.5000cca225c5fbeb
> > sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors
> >
> > ... that's serving as a general media dump with a single FFS2 file
> > system on it.
> >
> > Filesystem     Size    Used   Avail Capacity  Mounted on
> > /dev/sd1d      2.7T    2.5T   63.7G    98%    /export
> >
> > Yesterday, I experienced the odd effect that reading some files,
> > or parts of files, from that disk became excruciatingly slow.  We're
> > talking a few kB/s here.  Other files were fine.  There were no
> > kernel errors/warnings whatsoever.  There were no read errors, the
> > disk was just 100% busy and appeared to be returning data drip by
> > drip.
> >
> > # atactl sd1 smartstatus
> > No SMART threshold exceeded
> >
> > No change on reboot.  dd(1) from the raw device was initially fast,
> > then slowed to a crawl as it progressed.  I eventually "fixed" it
> > all by powering off the machine, jiggling the SATA connectors (all
> > fine), and powering the machine back up.
> >
> > Tonight the problem is back.  Something is very wrong.  Given that
> > dd if=/dev/rsd1c also seems affected, the filesystem layer can be
> > excluded.  I won't cry too much over a dying disk, but why the heck
> > are there no error indications of any kind?
> >
> > Any other ideas?

Anything in dmesg/kernel log about operations timing out?
 
> I think you are relying on the smart system too much.  Certainly try
> what David said, but it's obvious that the disk is sick despite what the
> smart system may say.
> 
> I've had about seven disk failures in the last several years.  Three or
> four of them the smart system was absolutely correct, with the others
> being less informative.  I've also had a false notice that a disk was
> bad,
> but worked for several years, till it got too small for its task.
> 
> Smart is good, but it has its limitations.  It best deals with gradual
> errors, not fast catastrophic ones.

Running smartmontools should give you enough information to determine if
you have a sick disk, though it may require looking at the values and
seeing if you have a rise in e.g. the number of sectors remapped; I
would not trust "atactl sd# smartstatus" by itself. Failing that, there
are more time-honored empirical tests, such as assuming the worst for
the disk's health if it is making weird noises when it slows to a crawl.

It could also be either the SATA cabling or the SATA controller that is
having trouble after warming up (with specific bit patterns, or just in
general). I know that sounds weird, but SATA cables aren't that
expensive to replace and it's quite possible the OP got a dud.

-- 
  Shawn K. Quinn
  skqu...@rushpost.com

Reply via email to