On 3/18/2013 3:09 AM, Hannes Reinecke wrote:
On 03/15/2013 08:13 PM, Bart Van Assche wrote:
On 03/15/13 19:51, Mike Christie wrote:
On 03/15/2013 08:41 AM, Bart Van Assche wrote:
How about using the value of scsi_cmnd.jiffies_at_alloc to finish
only
those SCSI commands in the host reset handler that exceeded a
certain
processing time ?

We basically do this now. When a scsi command times out the scsi
layer
blocks the host from processing new commands and waits for all
outstanding commands to either finish normally or timeout. When all
commands have finished or timedout, then we start the scsi eh
code. So,
by the time we have go to the scsi eh callbacks we are in a state
where
all the commands being processed by the eh have exceeded a certain
processing time.

If you mean you want to drop the block and wait part, then I think it
could speed things up to do the abort callbacks while other IO is
running (as long as the driver can support it). However if the abort
fails and you need to escalate to operations like resets which
interfere
with multiple commands, then the driver/scsi-ml does not have much
choice in what it does cleanup wise. There would be no point in
checking
the jiffies_at_alloc. The commands that are going to be affected
by the
tmf or host reset operation must be returned to the scsi-ml for
retries
or failure upwards.

Hello Mike,

It seems like there is a misunderstanding. With my comment I was not
referring to the SCSI ML but to the SCSI LLD. LLD drivers like
ib_srp keep track of outstanding SCSI requests. With the SRP
protocol it is possible to tell the InfiniBand HCA not to deliver
completions for outstanding requests by closing the connection used
for SRP communication. Hence my suggestion to finish SCSI commands
that were queued longer than a certain time ago from inside the LLD
host reset handler. I'm not sure though whether all types of FC
HBA's allow something equivalent.

Well, this is not quite identical to what I've been trying to
achieve with this patch.
This patch is for an individual rport which has gone out to lunch.
Sure we could down the link from the HBA, but that would terminate
I/O to _all_ connected rports, not just the malfunctioning one.
So that wouldn't help us here.

The closest equivalent to that would be a port logout; however, as discussed in the I_T nexus reset thread we would need another callout to the LLDs here as this definitely needs LLD support
and none of the current LLDs have it implemented.

Cheers,

Hannes

I think lpfc survives your rport state change as : part of the lld behavior on the callback, to clean up reference counts, is to abort all i/o that is outstanding to the rport. So the ref checking not only protects lpfc from prematurely freeing a structure (my real concern), but also just happens to abort all i/o. We got lucky.

I still believe the I_T_nexus reset is the right way to solve this.

-- james s


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to