Can anyone help me here?  I've got an AHA1542 with a quantum XP32150W
as the only device on the chain.  I keep getting errors like the following:

scsi : aborting command due to timeout : pid 198, scsi0, channel 0, id 0, lun 0 
Write (10) 00 00 27 88 b0 00 00 76 00
scsi : aborting command due to timeout : pid 198, scsi0, channel 0, id 0, lun 0 
Write (10) 00 00 27 88 b0 00 00 76 00

It looks like your drive failed to respond to a write request within the Linux 
disk IO timeout period.

      SCSI host 0 abort (pid 198) timed out - resetting
SCSI bus is being reset for host 0 channel 0.
Sent BUS DEVICE RESET to target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0
Sending DID_RESET for target 0

It looks likes Linux is resetting the SCSI bus and the SCSI disk.

aha1542_intr_handle: Unexpected interrupt
tarstat=0, hastat=0 idlun=10 ccb#=5
aha1542_intr_handle: Unexpected interrupt
tarstat=0, hastat=0 idlun=10 ccb#=7

It looks like the the Adaptec SCSI card finally tried to return the disk 
requests, it said the target status (the disk) and the host adapter status (the 
Adaptec) were ok. Although I wouldn't always trust what an Adaptec controller 
was telling me.

Too bad your logs don't give timing information. From the data given (and my 
limited knowledge of how Linux disk drivers work) I'd say that Linux's timeout 
on IO requests may be too short, causing a race condition between when the 
status of the async disk IO is is returned and when Linux goes in to reset 
mode. (Linux times out, sends the bus and device reset (which should cause this 
disk to forget about any IOs) but after all of that's over, the Adaptec tries 
to say the IOs completed successfully).

Last time I was writing SCSI disk drivers (about 1992) newer disks were capable 
of queuing 64 commands. That number could be up substantially now. If an 
average IO takes 10ms to process, 256 queued IOs could take 2.5 seconds to 
process; may be bumping into Linux's SCSI device driver timeout. Just a 
theory...

Then again this whole problem may be as simple as a bad SCSI cable or 
termination (very common problem and usually intermittent) or disk controller 
or host adapter brains going south.

I don't have the source for the SCSI disk controller, If I get a chance I'll 
download some source and try to look through it to see what the timeouts are 
and if they're easily configured.

Good Luck,

Al Youngwerth
[EMAIL PROTECTED]

Reply via email to