Re: Read i/o errs and disk replacement

Chris Murphy Tue, 18 Feb 2014 14:03:14 -0800

On Feb 18, 2014, at 2:33 PM, Wolfgang Mader <wolfgang_ma...@brain-frog.de> 
wrote:
> 
> 
> Feb 18 13:14:09 deck kernel: ata2.00: failed command: READ DMA
> Feb 18 13:14:09 deck kernel: ata2.00: cmd c8/00:08:60:f2:30/00:00:00:00:00/e0 
> tag 0 dma 4096 in
>                                      res 51/04:08:60:f2:30/00:00:00:00:00/e0 
> Emask 0x1 (device error)
> Feb 18 13:14:09 deck kernel: ata2.00: status: { DRDY ERR }
> Feb 18 13:14:09 deck kernel: ata2.00: error: { ABRT }
> Feb 18 13:14:09 deck kernel: ata2.15: hard resetting link
> Feb 18 13:14:14 deck kernel: ata2.15: link is slow to respond, please be 
> patient (ready=0)
> Feb 18 13:14:19 deck kernel: ata2.15: SRST failed (errno=-16)
> Feb 18 13:14:19 deck kernel: ata2.15: hard resetting link
> Feb 18 13:14:24 deck kernel: ata2.15: link is slow to respond, please be 
> patient (ready=0)
> Feb 18 13:14:29 deck kernel: ata2.15: SATA link up 3.0 Gbps (SStatus 123 
> SControl F300)
> Feb 18 13:14:29 deck kernel: 
> Feb 18 13:14:30 deck kernel: ata2.01: hard resetting link
> Feb 18 13:14:31 deck kernel: ata2.02: hard resetting link
> Feb 18 13:14:31 deck kernel: ata2.03: hard resetting link
> Feb 18 13:14:32 deck kernel: ata2.04: hard resetting link
> Feb 18 13:14:32 deck kernel: ata2.05: hard resetting link
> Feb 18 13:14:33 deck kernel: ata2.06: hard resetting link
> Feb 18 13:14:34 deck kernel: ata2.07: hard resetting link
> Feb 18 13:14:34 deck kernel: ata2.00: configured for UDMA/133
> Feb 18 13:14:34 deck kernel: ata2.01: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.02: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.03: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.04: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.05: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.06: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2.07: configured for UDMA/133
> Feb 18 13:14:35 deck kernel: ata2: EH complete


Two things. The full dmesg includes useful information separate from the error 
messages, including the model drive to ata device mapping, and why there's a 
failed read to ATA2.00 yet there's a reset in sequence for ata2.01, 2.02, 2.03 
and so on. So the entire dmesg would be useful.

In any case the actual problem might not be discoverable due to the hard 
resetting. I'm not finding any useful translation, in 5 minute search, for 
SRST. But it makes me suspicious of a configuration problem, like maybe an 
unnecessary jumper setting on a drive or with the enclosure itself. So I'd 
check for that. Also, what model drives are being used? If they are consumer 
drives, they almost certainly have long error recoveries over 30 minutes. And 
if the drive is trying to honor the read request for more than 30 seconds, the 
default SCSI block layer will time out and produce messages like what we see 
here. So you probably need to change the SCSI block layer timeout. To set the 
command timer to something else use:

echo <value> /sys/block/<device>/device/timeout

Where value is e.g. 121 since many consumer drives time out at 120 seconds this 
means the kernel will wait 121 seconds before starting its error handling 
(which includes resetting the drive and then the bus).



> -------end-------
> 
> This output it repeated several times and than end in this read error
> 
> [Tue Feb 18 13:15:48 2014] btrfs: bdev /dev/sdb errs: wr 0, rd 2, flush 0, 
> corrupt 0, gen 0
> [Tue Feb 18 13:15:48 2014] ata2: EH complete
> [Tue Feb 18 13:15:48 2014] btrfs read error corrected: ino 1 off 29184540672 
> (dev /dev/sdb sector 3207776)

Well that reads like Btrfs knows what sector had a read problem, without 
corruption being the cause, and corrected it. So the question then is whether 
/dev/sdb is the same as ata2.00. If ata2.00 isn't a drive but is the drive 
enclosure then you've got a different (or additional) problem.



Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Read i/o errs and disk replacement

Reply via email to