Hello Bart, 

I am returning BLK_EH_HANDLED in iscsi_eh_cmd_timed_out(). Do you mean 
something different ? 

That paragraph means that I have tried to return BLK_EH_NOT_HANDLED first, 
because that would be the other option instead of BLK_EH_RESET_TIMER (which is 
causing this issue), but if I did it, the EH logic would try 
scsi_abort_command() and - successful or not - it would try to get sense before 
completion, causing more traffic on a bad state transport.

Best way to allow faster completion was, indeed, returning BLK_EH_HANDLED, but 
changing result to DID_NO_CONNECT, because that will tell block layer not to 
retry, allowing the completion to happen in the SOFTIRQ handler, informing 
result to the upper layer. 

For the queue, simply now allowing queueing on such condition (shutdown + state 
!= logged in) seemed correct. 

Let me know if you want me to try something else. I would be happy to.

Best,
-Rafael

> On 08/12/2017, at 09:12 PM, Bart Van Assche <bart.vanass...@wdc.com> wrote:
> 
> On Thu, 2017-12-07 at 19:59 -0200, Rafael David Tinoco wrote:
>> This happens because iscsi_eh_cmd_timed_out(), the transport layer
>> timeout helper, would tell the queue timeout function (scsi_times_out)
>> to reset the request timer over and over, until the session state is
>> back to logged in state. Unfortunately, during server shutdown, this
>> might never happen again.
> 
> Hello Rafael,
> 
> Have you considered to make iscsi_eh_cmd_timed_out() return BLK_EH_HANDLED
> if system_state != SYSTEM_RUNNING? That could result in slower shutdown than
> with your patch but such a change would probably be really easy to review.
> 
> Thanks,
> 
> Bart.

Reply via email to