Hi all,

we do have a problem with NOP processing. Under heavy I/O, the connection 
starts spitting out
'ping timeout' error messages and resetting the connection.
There has been _no_ error from the target side nor the connection, so it's us, 
sadly.

However, after some heavy instrumenting I found this:
[ 2664.456270]  connection2:0: Sending nopout exp 1913193 max 1913177 queued 
1913194

[ 2664.462592]  connection2:0: mgmtpdu [itt xa05 p ffff8100795675c0] queued

[ 2584.459166]  connection2:0: mgmtpdu [op 0x0 hdr->itt 0xa05 datalen 0]

[ 2584.466259]  connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] done

[ 2585.232044]  connection2:0: mgmtpdu [op 0x2 hdr->itt 0xa06 datalen 0]

[ 2585.235410]  connection2:0: mgmtpdu [itt 0xa06 p ffff810079567540] done

[ 2666.376858]  connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, 
cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194

[ 2667.169094]  connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, 
cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194

[ 2667.179683]  connection2:0: Sending nopout,cmd 0 mgmt 0 req 0 exp 1913193 
max 1913177 queued 1913194

[ 2669.092002]  connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, 
cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194

[ 2669.093935]  connection2:0: ping timeout of 5 secs expired, stat 
7880612636/308802796/2 state 1/-150303960

[ 2669.097568]  connection2:0: iscsi: detected conn error (1011)

[ 2669.201246]  session2: blocking session

[ 2669.206148]  connection2:0: mgmtpdu [itt xa05 p ffff8100795675c0] finished


(ignore the clock skew, it's a multicore TSC problem ...).

The numbers are ExpCmdSN, MaxCmdSN, and queued CmdSN.
So, as you can see, we're for some reason sending a NOP-Out with CmdSN 1913194, 
ExpCmdSN 1913193, and MaxCmdSN 191377.
But reading through RFC 3720, I found this:

   The target MUST NOT transmit a MaxCmdSN that is less than
   ExpCmdSN-1.  For non-immediate commands, the CmdSN field can take any
   value from ExpCmdSN to MaxCmdSN inclusive.  The target MUST silently
   ignore any non-immediate command outside of this range or non-
   immediate duplicates within the range.  The CmdSN carried by
   immediate commands may lie outside the ExpCmdSN to MaxCmdSN range.
   For example, if the initiator has previously sent a non-immediate
   command carrying the CmdSN equal to MaxCmdSN, the target window is
   closed.  For group task management commands issued as immediate
   commands, CmdSN indicates the scope of the group action (e.g., on
   ABORT TASK SET indicates which commands are aborted).

So no wonder we're never seeing any replies to that one.

Question remains, though, why we're starting to send PDUs with invalid
MaxCmdSN numbers ...

Apologies as I'm running on an older codebase. But the same error / 'behaviour'
is present in mainline as well.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to