Re: No abort is sent for a WRITE command that takes too long

Erez Zilber Mon, 18 May 2009 07:55:33 -0700

On Mon, May 18, 2009 at 4:36 PM, Mike Christie <micha...@cs.wisc.edu> wrote:
>
> Erez Zilber wrote:
>>
>> I enabled open-iscsi logging + added some printk calls when the abort
>> handler returns.
>> Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but
>> there's no abort.
>
>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out scsi
>> cmd ffff8101e30efe40 timedout
>> May 17 11:00:06 kpc36 kernel:  session1: iscsi_eh_cmd_timed_out return
>> timer reset
>
> As you can see in iscsi_eh_cmd_timed_out, if the sesison is down then
> there is no point in letting the scsi eh run since we have to relogin
> and restart commands so we would return reset timer which prevents the
> scsi eh from running.


Makes sense. There's only one thing that I don't understand - when
does scsi-ml call  iscsi_eh_cmd_timed_out? I thought that if a cmd
times out, scsi-ml sends an abort.

>
> And then there is code in there to check if we are in the middle of
> checking the connection. If we are then we ask for some more time with
> the command, and that will prevent the scsi eh from running. This looks
> like it can be problem because we would get a response to our nop which
> would update the last_recv field. If there was no progress being made
> for the scsi command we would still ask to reset the timer and we could
> end up in that loop forever since the scsi layer does not cap the number
> of times you can reset the time. I will send a patch to fix that.
>
>
> However, that probably will not fix your problem.
>
>
> For your specific setup, it looks like we hit the
> iscsi_eh_cmd_timed_out, reset the scsi command timer becuase we are in
> the middle of checking the the connection with the nop/ping, but then
> the nop/ping does not return in time and so we drop the session:
>
>   connection1:0: ping timeout of 5 secs
> expired, recv timeout 5, last rx 4526718494, last ping 4526723494, now
> 4526728494
>
> That is why on the target you see it cleanup up commands. On the
> initiator you can see us cleaning up:
>
> May 17 11:00:07 kpc36 kernel:  session1: iscsi_start_session_recovery
> blocking session
> May 17 11:00:07 kpc36 kernel:  session1: fail_scsi_tasks failing sc
> ffff8101e30efe40 itt 0x13 state 3
>
> And then later in the logs you will see us start the commands again when
> we are logged in again.
>
>
> So you probably need to continue to replying to nops when the r2t is
> dropped. I will fix it on the initiatotr side to detect if we are not
> getting IO for a specific command and then let the scsi eh run.

The current behavior doesn't create a problem for me - instead of
getting an 'abort' for the cmd, the session gets dropped and the cmd
is cleaned up anyway. I was only wondering why it happens.

Thanks for the detailed explanation. It was helpful.

Erez

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Re: No abort is sent for a WRITE command that takes too long

Reply via email to