Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Bart Van Assche
On 05/19/14 16:08, Paolo Bonzini wrote:
> 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
> concurrently, and call scsi_finish_command without any lock protecting
> the calls.  You can then get memory corruption.

I'm not sure what the recommended approach is to address this race. But
it is possible to address this in the LLD. See e.g. the srp_claim_req()
function in the SRP LLD and how it is invoked from the reply handler,
the abort handler and the reset handlers in that LLD.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Christoph Hellwig
On Mon, May 19, 2014 at 05:08:56PM +0200, Bart Van Assche wrote:
> On 05/19/14 16:08, Paolo Bonzini wrote:
> > 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
> > concurrently, and call scsi_finish_command without any lock protecting
> > the calls.  You can then get memory corruption.
> 
> I'm not sure what the recommended approach is to address this race. But
> it is possible to address this in the LLD. See e.g. the srp_claim_req()
> function in the SRP LLD and how it is invoked from the reply handler,
> the abort handler and the reset handlers in that LLD.

blk-mq triest to solve this a test_and_set_bit for a completion flag
at the block layer for completions vs timeouts.  I think doing this in
the SCSI layer as well would be very useful as we can't expect Joe
Random Driver Developer to get it right in every driver.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Paolo Bonzini

Il 19/05/2014 17:08, Bart Van Assche ha scritto:

On 05/19/14 16:08, Paolo Bonzini wrote:

2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
concurrently, and call scsi_finish_command without any lock protecting
the calls.  You can then get memory corruption.


I'm not sure what the recommended approach is to address this race. But
it is possible to address this in the LLD. See e.g. the srp_claim_req()
function in the SRP LLD and how it is invoked from the reply handler,
the abort handler and the reset handlers in that LLD.


That's not enough, unless I'm missing something.  Say the request 
handler claims the request and the abort handler doesn't:


- the request handler calls scsi_done and ends up in scsi_finish_command.

- the abort handler will return SUCCESS, and scmd_eh_abort_handler then 
calls scsi_finish_command.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-19 Thread Bart Van Assche
On 05/19/14 18:09, Paolo Bonzini wrote:
> Il 19/05/2014 17:08, Bart Van Assche ha scritto:
>> On 05/19/14 16:08, Paolo Bonzini wrote:
>>> 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
>>> concurrently, and call scsi_finish_command without any lock protecting
>>> the calls.  You can then get memory corruption.
>>
>> I'm not sure what the recommended approach is to address this race. But
>> it is possible to address this in the LLD. See e.g. the srp_claim_req()
>> function in the SRP LLD and how it is invoked from the reply handler,
>> the abort handler and the reset handlers in that LLD.
> 
> That's not enough, unless I'm missing something.  Say the request
> handler claims the request and the abort handler doesn't:
> 
> - the request handler calls scsi_done and ends up in scsi_finish_command.
> 
> - the abort handler will return SUCCESS, and scmd_eh_abort_handler then
> calls scsi_finish_command.

It depends on how the SCSI abort handler gets invoked. If the SCSI abort
handler gets invoked because a SCSI command timed out that means that
the block layer has already detected a timeout and also that the
REQ_ATOM_COMPLETE bit has already been set. In this scenario if a SCSI
LLD invokes scsi_done() that causes blk_complete_request() to return
without invoking __blk_complete_request() and hence without invoking
scsi_softirq_done().

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-20 Thread Bart Van Assche
On 05/19/14 18:43, Bart Van Assche wrote:
> On 05/19/14 18:09, Paolo Bonzini wrote:
>> Il 19/05/2014 17:08, Bart Van Assche ha scritto:
>>> On 05/19/14 16:08, Paolo Bonzini wrote:
 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
 concurrently, and call scsi_finish_command without any lock protecting
 the calls.  You can then get memory corruption.
>>>
>>> I'm not sure what the recommended approach is to address this race. But
>>> it is possible to address this in the LLD. See e.g. the srp_claim_req()
>>> function in the SRP LLD and how it is invoked from the reply handler,
>>> the abort handler and the reset handlers in that LLD.
>>
>> That's not enough, unless I'm missing something.  Say the request
>> handler claims the request and the abort handler doesn't:
>>
>> - the request handler calls scsi_done and ends up in scsi_finish_command.
>>
>> - the abort handler will return SUCCESS, and scmd_eh_abort_handler then
>> calls scsi_finish_command.
> 
> It depends on how the SCSI abort handler gets invoked. If the SCSI abort
> handler gets invoked because a SCSI command timed out that means that
> the block layer has already detected a timeout and also that the
> REQ_ATOM_COMPLETE bit has already been set. In this scenario if a SCSI
> LLD invokes scsi_done() that causes blk_complete_request() to return
> without invoking __blk_complete_request() and hence without invoking
> scsi_softirq_done().

(replying to my own e-mail)

Please note that scsi_eh_abort_cmds() neither checks nor sets the
REQ_ATOM_COMPLETE bit before it invokes hostt->eh_abort_handler(). Would
it make sense to modify that function such that it invokes
blk_abort_request() instead ? That last function atomically
test-and-sets the REQ_ATOM_COMPLETE bit before invoking the timeout handler.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-20 Thread Bart Van Assche
On 05/20/14 09:32, Bart Van Assche wrote:
> On 05/19/14 18:43, Bart Van Assche wrote:
>> On 05/19/14 18:09, Paolo Bonzini wrote:
>>> Il 19/05/2014 17:08, Bart Van Assche ha scritto:
 On 05/19/14 16:08, Paolo Bonzini wrote:
> 2) reentrancy: the softirq handler and scmd_eh_abort_handler can run
> concurrently, and call scsi_finish_command without any lock protecting
> the calls.  You can then get memory corruption.

 I'm not sure what the recommended approach is to address this race. But
 it is possible to address this in the LLD. See e.g. the srp_claim_req()
 function in the SRP LLD and how it is invoked from the reply handler,
 the abort handler and the reset handlers in that LLD.
>>>
>>> That's not enough, unless I'm missing something.  Say the request
>>> handler claims the request and the abort handler doesn't:
>>>
>>> - the request handler calls scsi_done and ends up in scsi_finish_command.
>>>
>>> - the abort handler will return SUCCESS, and scmd_eh_abort_handler then
>>> calls scsi_finish_command.
>>
>> It depends on how the SCSI abort handler gets invoked. If the SCSI abort
>> handler gets invoked because a SCSI command timed out that means that
>> the block layer has already detected a timeout and also that the
>> REQ_ATOM_COMPLETE bit has already been set. In this scenario if a SCSI
>> LLD invokes scsi_done() that causes blk_complete_request() to return
>> without invoking __blk_complete_request() and hence without invoking
>> scsi_softirq_done().
> 
> (replying to my own e-mail)
> 
> Please note that scsi_eh_abort_cmds() neither checks nor sets the
> REQ_ATOM_COMPLETE bit before it invokes hostt->eh_abort_handler(). Would
> it make sense to modify that function such that it invokes
> blk_abort_request() instead ? That last function atomically
> test-and-sets the REQ_ATOM_COMPLETE bit before invoking the timeout handler.

(answering my own question)

REQ_ATOM_COMPLETE is already set before scsi_eh_scmd_add() is called
since that function is only invoked after the block layer has marked a
request as "complete". The only callers of scsi_eh_scmd_add() are
scsi_softirq_done(), scsi_times_out() and scmd_eh_abort_handler(). That
last function is invoked (indirectly) by scsi_times_out().

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-20 Thread Bart Van Assche
On 05/19/14 16:08, Paolo Bonzini wrote:
> 1) dangling pointers: scsi_put_command calls cancel_delayed_work(), but
> that doesn't mean that the scmd_eh_abort_handler couldn't be already
> running.  If the scmd_eh_abort_handler starts while the softirq handler
> is calling scsi_put_command (e.g. scsi_finish_command ->
> scsi_io_completion -> scsi_end_request -> scsi_next_command), the
> pointer to the Scsi_Cmnd* becomes invalid in the middle of the abort
> handler.

Hannes, can you clarify why a cancel_delayed_work() statement was added
in scsi_put_command() ? How can scsi_put_command() get invoked after the
SCSI timeout handler queued &scmd->abort_work and before the function
associated with that work struct (scmd_eh_abort_handler()) is called ?

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-20 Thread Paolo Bonzini

Il 20/05/2014 10:10, Bart Van Assche ha scritto:

REQ_ATOM_COMPLETE is already set before scsi_eh_scmd_add() is called
since that function is only invoked after the block layer has marked a
request as "complete". The only callers of scsi_eh_scmd_add() are
scsi_softirq_done(), scsi_times_out() and scmd_eh_abort_handler(). That
last function is invoked (indirectly) by scsi_times_out().


Yes, and answering my own question, you cannot have a dangling pointer 
in eh_abort_handler (unless you have a driver bug).  This is because 
once eh_abort_handler is called, you know the interrupt handler will not 
trigger the softirq and scsi_finish_command won't be called.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-21 Thread Mark Wu
Is it possible that when scsi_done is invoked, the scsi command or the 
request has been freed and then reallocated for a new I/O request?  Because 
of this the bit flag REQ_ATOM_COMPLETE becomes unreliable.  Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-21 Thread Paolo Bonzini

Il 21/05/2014 16:16, Mark Wu ha scritto:

Is it possible that when scsi_done is invoked, the scsi command or the
request has been freed and then reallocated for a new I/O request?  Because
of this the bit flag REQ_ATOM_COMPLETE becomes unreliable.  Thanks!


It is up to the driver to ensure that the interrupt handler will not 
process the Scsi_Cmnd* after returning from the eh_abort_handler.  If 
you do this, what you say cannot happen.  Otherwise you'll get a variety 
of failures, the most common of which for me are OOPSes and a BUG in 
blk_finish_request.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-22 Thread Elliott, Robert (Server Storage)


> -Original Message-
> From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
> ow...@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Wednesday, 21 May, 2014 3:34 PM
> To: Mark Wu; linux-scsi@vger.kernel.org
> Subject: Re: dangling pointers and/or reentrancy in
> scmd_eh_abort_handler?
> 
> Il 21/05/2014 16:16, Mark Wu ha scritto:
> > Is it possible that when scsi_done is invoked, the scsi command or the
> > request has been freed and then reallocated for a new I/O request?
> Because
> > of this the bit flag REQ_ATOM_COMPLETE becomes unreliable.  Thanks!
> 
> It is up to the driver to ensure that the interrupt handler will not
> process the Scsi_Cmnd* after returning from the eh_abort_handler.  If
> you do this, what you say cannot happen.  Otherwise you'll get a variety
> of failures, the most common of which for me are OOPSes and a BUG in
> blk_finish_request.
> 
> Paolo

Aborts don't always work; scsi_eh_abort_handler cannot promise that 
it aborted the command (unless it escalated to sending resets.)
When the abort fails, scmd_eh_abort_handler just leaves it
outstanding, and the error handler thread (scsi_error_handler
function) must deal with it.

If the abort succeeds, then scmd_eh_abort_handler calls
scsi_finish_command.

In SCSI terms, FUNCTION COMPLETE for ABORT TASK doesn't mean the 
command was present and aborted and won't be sending back status; 
if the device server already sent status for the command, the 
task manager also sends FUNCTION COMPLETE.

Depending on the transport, there may be a race condition between 
the command status and the ABORT TASK response; the ABORT TASK 
response might get back first.  I think that means 
scsi_eh_abort_handler's call to scsi_finish_command could
happen during or after scsi_softirq_done has called
scsi_finish_command.

---
Rob ElliottHP Server Storage






--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dangling pointers and/or reentrancy in scmd_eh_abort_handler?

2014-05-23 Thread Paolo Bonzini

Il 23/05/2014 03:28, Elliott, Robert (Server Storage) ha scritto:

Depending on the transport, there may be a race condition between
the command status and the ABORT TASK response; the ABORT TASK
response might get back first.  I think that means
scsi_eh_abort_handler's call to scsi_finish_command could
happen during or after scsi_softirq_done has called
scsi_finish_command.


That was my doubt, in fact.  But actually Bart explained that this is 
not the case.  Either scsi_eh_abort_handler will be called via the work 
item, or the command will never be sent to the softirq.


Paolo

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html