From [EMAIL PROTECTED] Fri Jan 1 23:24:51 1999
From: James Rich <[EMAIL PROTECTED]>
SCSI tape stuff happens...
st0: Buffer flushed, 1 EOF(s) written
st0: Rewinding tape.
st0: Block limits 1 - 65536 bytes.
st0: Mode sense. Length 14, medium 0, WBS 10, BLL 8
st0: Density 11, tape length: 0, drv buffer: 1
st0: Block size: 1024, buffer size: 32768 (32 blocks).
st0: Error: 28000002, cmd: 8 1 0 0 20 0 Len: 32768
FMK Current error st09:00: sense key None
st0: Sense: f0 0 80 0 0 0 14 6
st0: EOF detected (12288 bytes read).
st0: EOF up (1). Left 12288, needed 2048.
st0: EOF/EOM flag up (1). Bytes 10240
st0: EOF up (1). Left 10240, needed 10240.
st0: Rewinding tape.
Now a timeout occurs. The routine internal_cmnd() in scsi.c has
registered scsi_old_times_out() as the routine to call upon timeout.
It lives in scsi_obsolete.c, notices NORMAL_TIMEOUT and calls
scsi_abort(). This routine prints
scsi : aborting command due to timeout : pid 3870, scsi0, channel 0, id
0, lun 0
Read (10) 00 00 24 5c 90 00 00 08 00
But this is a command to the disk that got the timeout. Strange..
[Maybe nothing is wrong with the disk, there never is...]
scsi0: Aborting CCB #3883 to Target 0
SCSI host 0 abort (pid 3870) timed out - resetting
The SCSI controller did not react to the abort
[If the problem is reproducible, and you are using Buslogic.c
you might see whether BusLogic_AbortedCommandNotFound is returned
by the controller. It seems at first sight that the driver does
ignore such a return status. At least it should print some message,
I think, in case SCSI error logging is enabled.]
so the error recovery code decides to reset the disk drive.
SCSI bus is being reset for host 0 channel 0.
scsi0: Sending Bus Device Reset CCB #3885 to Target 0
Also the disk reset times out. A bus reset is attempted.
SCSI host 0 channel 0 reset (pid 3870) timed out - trying harder
SCSI bus is being reset for host 0 channel 0.
scsi0: Resetting BusLogic BT-958 due to Target 0
scsi0: Resetting BusLogic BT-958 Failed
SCSI host 0 reset (pid 3870) timed out again -
probably an unrecoverable SCSI bus or device hang.
Here is the next disk request - it fails in the same way.
scsi : aborting command due to timeout : pid 3871, scsi0, channel 0, id
0, lun 0
Write (6) 0f 90 14 02 00
scsi0: Unable to Abort Command to Target 0 - CCB Reset
I've put some printk's to st.c and found that it does indeed get stuck
while trying to do a close on the device.
After this the system goes through cycles of responsiveness and no
response. The load skyrockets.
Yes.
This ought to be good information to work from.
[Information as I see it: nothing is wrong with the disk.
Nothing is wrong with the SCSI controller.
But some error occurs somewhere, and the SCSI subsystem gets
terminally confused.
Now scsi_old_times_out() starts with spin_lock_irqsave();
maybe handling of IRQ or io_request_lock is flawed.]
If you can reproduce it, even better.
No doubt Leonard Zubkoff will correct all that is wrong in the above.
Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]