Re: more on scsi tape failures

Guest section DW Fri, 1 Jan 1999 19:57:44 -0500
        From [EMAIL PROTECTED] Fri Jan  1 23:24:51 1999
        From: James Rich <[EMAIL PROTECTED]>

SCSI tape stuff happens...

        st0: Buffer flushed, 1 EOF(s) written
        st0: Rewinding tape.
        st0: Block limits 1 - 65536 bytes.
        st0: Mode sense. Length 14, medium 0, WBS 10, BLL 8
        st0: Density 11, tape length: 0, drv buffer: 1
        st0: Block size: 1024, buffer size: 32768 (32 blocks).
        st0: Error: 28000002, cmd: 8 1 0 0 20 0 Len: 32768
        FMK Current error st09:00: sense key None
        st0: Sense: f0  0 80  0  0  0 14  6
        st0: EOF detected (12288 bytes read).
        st0: EOF up (1). Left 12288, needed 2048.
        st0: EOF/EOM flag up (1). Bytes 10240
        st0: EOF up (1). Left 10240, needed 10240.
        st0: Rewinding tape.

Now a timeout occurs. The routine internal_cmnd() in scsi.c has
registered scsi_old_times_out() as the routine to call upon timeout.
It lives in scsi_obsolete.c, notices NORMAL_TIMEOUT and calls
scsi_abort(). This routine prints

        scsi : aborting command due to timeout : pid 3870, scsi0, channel 0, id 
        0, lun 0
         Read (10) 00 00 24 5c 90 00 00 08 00 

But this is a command to the disk that got the timeout. Strange..
[Maybe nothing is wrong with the disk, there never is...]

        scsi0: Aborting CCB #3883 to Target 0
        SCSI host 0 abort (pid 3870) timed out - resetting

The SCSI controller did not react to the abort
[If the problem is reproducible, and you are using Buslogic.c
you might see whether BusLogic_AbortedCommandNotFound is returned
by the controller. It seems at first sight that the driver does
ignore such a return status. At least it should print some message,
I think, in case SCSI error logging is enabled.]
so the error recovery code decides to reset the disk drive.

        SCSI bus is being reset for host 0 channel 0.
        scsi0: Sending Bus Device Reset CCB #3885 to Target 0

Also the disk reset times out. A bus reset is attempted.

        SCSI host 0 channel 0 reset (pid 3870) timed out - trying harder
        SCSI bus is being reset for host 0 channel 0.
        scsi0: Resetting BusLogic BT-958 due to Target 0
        scsi0: Resetting BusLogic BT-958 Failed
        SCSI host 0 reset (pid 3870) timed out again -
        probably an unrecoverable SCSI bus or device hang.

Here is the next disk request - it fails in the same way.

        scsi : aborting command due to timeout : pid 3871, scsi0, channel 0, id 
        0, lun 0
         Write (6) 0f 90 14 02 00 
        scsi0: Unable to Abort Command to Target 0 - CCB Reset

        I've put some printk's to st.c and found that it does indeed get stuck 
        while trying to do a close on the device.

        After this the system goes through cycles of responsiveness and no 
        response.  The load skyrockets.

Yes.

This ought to be good information to work from.

[Information as I see it: nothing is wrong with the disk.
 Nothing is wrong with the SCSI controller.
 But some error occurs somewhere, and the SCSI subsystem gets
 terminally confused.

 Now scsi_old_times_out() starts with spin_lock_irqsave();
 maybe handling of IRQ or io_request_lock is flawed.]

If you can reproduce it, even better.
No doubt Leonard Zubkoff will correct all that is wrong in the above.

Andries


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
Re: more on scsi tape failures

Reply via email to