I have a set of patches. I have verified that everything compiles, but I
have no way at the moment of testing to make sure I haven't broken anything.
Anyways, with that caveat aside, I believe this should correct the following
problems:
1) Incorrect sg segment count. The problem was that I botched something
when I did a merge as the code for segment counting, and I assumed that I
could use the segment counting in ll_rw_blk to save some work. It turns out
we cannot - we should just use our own functions, and then everything should
be OK.
2) I fixed most obvious part of the > 16 disks problem in a cleaner way.
3) The problem of being unable to write a cdrom was kind of interesting.
The whole story was evident from the logs that were sent. Essentially what
is happening is that a command is failing with DID_ERROR, but the HA isn't
automatically requesting it (for one reason or another), and the error
handler thread got invoked. The error handler got the sense data, and
decided that everything was OK, and then dropped down and tried calling
scsi_done() to report the completion of the command. Right at the start of
scsi_done(), we attempt to remove the timer from the command - if this
fails, it is assumed that the command timed out, that the error handler is
running, and hence scsi_done does nothing on the assumption that the error
handler thread is taking care of it. Which it was, kind of. The "fix"
isn't the cleanest thing, but it was the most obvious - just add a timer
back to the command before we call scsi_done(). Maybe I will think of a
more elegant fix in the next day or so, so this one is more of a workaround.
-Eric
Fixes.dat