On Thu, 2011-12-01 at 20:00 +0100, Bart Van Assche wrote:
> If a user misconfigures the block layer timeout such that it is below
> the InfiniBand RC timeout it can happen that an SRP reply arrives after
> the SCSI error handler has already killed the associated SCSI command.
> Avoid that late replies cause a kernel crash.

I'm not sure that the case you describe can happen. The request do not
get killed until srp_remove_req() is called on them. In the event of
timeouts, this will be through srp_abort() initially, and then
srp_reset_device() and srp_reset_host(). If srp_abort() is successful,
we will never see a reply for that request unless the target is broken.
If it fails, a successful srp_reset_device() gives us the same
assurance. And if that fails, we kill the connection in
srp_reset_host(), so there is no possibility of a late reply.

However, the code as it currently is leaves us open to malformed
requests from the target. We cannot do anything about it if the target
is swapping tags around on the requests, but we should detect invalid
tags and take appropriate action. Checking for a null req->scmnd and
returning isn't the right action, as that leaks the request, and does
nothing to protect against bad tags.

I'm happy to code up a patch that sanity-checks the response for
in-range and outstanding (req->scmnd != NULL) responses, and kills the
connection, or you are welcome to do so as well. I think it would be
better suited later in the patch series, building on top of your work to
disconnect a target without removing the module.

-- 
Dave Dillow
National Center for Computational Science
Oak Ridge National Laboratory
(865) 241-6602 office

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to