What kernel version are you using?  Very important.

> My I/O exerciser will occassionally "hang" under certain subsystem error
> inject scenarios.  It can run OK for anywhere between 0 and 38 attempts
> before failing.
>
> The only thing different that I noticed on the failed attempts is that
> there is a  "scsidisk I/O error" entry in /var/log/messages that does not
> appear on the successful error inject attempts:
>
>    Apr 30 07:43:26 linux1 kernel: SCSI host 2 abort (pid 707672569) timed
> out - resetting
>    Apr 30 07:43:26 linux1 kernel: SCSI bus is being reset for host 2
> channel 0.
>    Apr 30 07:44:28 linux1 kernel: SCSI disk error : host 2 channel 0 id 6
> lun 6 return code = 26030000

    OK, this is:

    driver_byte = ((DRIVER_TIMEOUT | SUGGEST_ABORT))
    host_byte = (DID_TIME_OUT )

    This by itself doesn't indicate a problem - just a command that took too
long to complete.

> -->Apr 30 07:44:28 linux1 kernel: scsidisk I/O error: dev 08:71, sector
> 23801000
>[...]
> If "!uptodate" is an error worthy of a printk, shouldn't some sort of
error
> be returned back?

    There is an error being returned back, but the mechanism isn't obvious.
Essentially we are taking the blocks for the command and marking them to
indicate that there is no longer I/O pending.  The uptodate flag for the
buffer indicates whether the I/O was completed with success or not - this is
the flag that can be used by the process that initiated the I/O to make sure
that everything went OK.

    If you think about it, the call to end_scsi_request can take place from
the context of an interrupt handler (a bottom half handler in this case).
The return value from the function isn't going to be significant once you
return from SCSI into the general purpose kernel code.

> One strange thing (to me) is the pattern to the "scsidisk I/O error"
> messages is
> that they are only reported for minor numbers ending with "1" and ALL of
> those
> map to devices (/dev/sdXY) that I do not use:
[...]
> ...snip...
> This continues on (and on) for dev  08:11, 08:21, 08:31, 08:41, 08:51,
> 08:61, and 08:71
>
> I have one partition per LUN, so I use dev: 08:1, 08:17, 08:33,...,
> 08:n+16, 65:1, 65:17, ...

    The minor numbers above are reported in hexadecimal :-).

> Any guidance on how to pin this down would be greatly appreciated.
>
> The other question is what is going on w/ host2, channel0, id6... I'll put
> a SCSI bus analyzer on that and see if I can find the cause of the 2603's
> (I assume that is the ASC/ASCQ values, correct???) although they appear on
> the successfully recovered error injects as well as the bad ones.

    No, see above.

-Eric



-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to