On Thu, Jun 5, 2014, at 05:24 PM, STeve Andre' wrote: > On 06/05/14 17:38, Christian Weisgerber wrote: > > I have a 3TB disk here... > > > > sd1 at scsibus1 targ 1 lun 0: <ATA, Hitachi HUA72303, MKAO> SCSI3 0/direct > > fixed naa.5000cca225c5fbeb > > sd1: 2861588MB, 512 bytes/sector, 5860533168 sectors > > > > ... that's serving as a general media dump with a single FFS2 file > > system on it. > > > > Filesystem Size Used Avail Capacity Mounted on > > /dev/sd1d 2.7T 2.5T 63.7G 98% /export > > > > Yesterday, I experienced the odd effect that reading some files, > > or parts of files, from that disk became excruciatingly slow. We're > > talking a few kB/s here. Other files were fine. There were no > > kernel errors/warnings whatsoever. There were no read errors, the > > disk was just 100% busy and appeared to be returning data drip by > > drip. > > > > # atactl sd1 smartstatus > > No SMART threshold exceeded > > > > No change on reboot. dd(1) from the raw device was initially fast, > > then slowed to a crawl as it progressed. I eventually "fixed" it > > all by powering off the machine, jiggling the SATA connectors (all > > fine), and powering the machine back up. > > > > Tonight the problem is back. Something is very wrong. Given that > > dd if=/dev/rsd1c also seems affected, the filesystem layer can be > > excluded. I won't cry too much over a dying disk, but why the heck > > are there no error indications of any kind? > > > > Any other ideas?
Anything in dmesg/kernel log about operations timing out? > I think you are relying on the smart system too much. Certainly try > what David said, but it's obvious that the disk is sick despite what the > smart system may say. > > I've had about seven disk failures in the last several years. Three or > four of them the smart system was absolutely correct, with the others > being less informative. I've also had a false notice that a disk was > bad, > but worked for several years, till it got too small for its task. > > Smart is good, but it has its limitations. It best deals with gradual > errors, not fast catastrophic ones. Running smartmontools should give you enough information to determine if you have a sick disk, though it may require looking at the values and seeing if you have a rise in e.g. the number of sectors remapped; I would not trust "atactl sd# smartstatus" by itself. Failing that, there are more time-honored empirical tests, such as assuming the worst for the disk's health if it is making weird noises when it slows to a crawl. It could also be either the SATA cabling or the SATA controller that is having trouble after warming up (with specific bit patterns, or just in general). I know that sounds weird, but SATA cables aren't that expensive to replace and it's quite possible the OP got a dud. -- Shawn K. Quinn skqu...@rushpost.com