On Mon, May 18, 2009 at 4:36 PM, Mike Christie <micha...@cs.wisc.edu> wrote: > > Erez Zilber wrote: >> >> I enabled open-iscsi logging + added some printk calls when the abort >> handler returns. >> Here's the log. I see that iscsi_eh_cmd_timed_out gets called, but >> there's no abort. > >> May 17 11:00:06 kpc36 kernel: session1: iscsi_eh_cmd_timed_out scsi >> cmd ffff8101e30efe40 timedout >> May 17 11:00:06 kpc36 kernel: session1: iscsi_eh_cmd_timed_out return >> timer reset > > As you can see in iscsi_eh_cmd_timed_out, if the sesison is down then > there is no point in letting the scsi eh run since we have to relogin > and restart commands so we would return reset timer which prevents the > scsi eh from running.
Makes sense. There's only one thing that I don't understand - when does scsi-ml call iscsi_eh_cmd_timed_out? I thought that if a cmd times out, scsi-ml sends an abort. > > And then there is code in there to check if we are in the middle of > checking the connection. If we are then we ask for some more time with > the command, and that will prevent the scsi eh from running. This looks > like it can be problem because we would get a response to our nop which > would update the last_recv field. If there was no progress being made > for the scsi command we would still ask to reset the timer and we could > end up in that loop forever since the scsi layer does not cap the number > of times you can reset the time. I will send a patch to fix that. > > > However, that probably will not fix your problem. > > > For your specific setup, it looks like we hit the > iscsi_eh_cmd_timed_out, reset the scsi command timer becuase we are in > the middle of checking the the connection with the nop/ping, but then > the nop/ping does not return in time and so we drop the session: > > connection1:0: ping timeout of 5 secs > expired, recv timeout 5, last rx 4526718494, last ping 4526723494, now > 4526728494 > > That is why on the target you see it cleanup up commands. On the > initiator you can see us cleaning up: > > May 17 11:00:07 kpc36 kernel: session1: iscsi_start_session_recovery > blocking session > May 17 11:00:07 kpc36 kernel: session1: fail_scsi_tasks failing sc > ffff8101e30efe40 itt 0x13 state 3 > > And then later in the logs you will see us start the commands again when > we are logged in again. > > > So you probably need to continue to replying to nops when the r2t is > dropped. I will fix it on the initiatotr side to detect if we are not > getting IO for a specific command and then let the scsi eh run. The current behavior doesn't create a problem for me - instead of getting an 'abort' for the cmd, the session gets dropped and the cmd is cleaned up anyway. I was only wondering why it happens. Thanks for the detailed explanation. It was helpful. Erez --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---