Hi all, we do have a problem with NOP processing. Under heavy I/O, the connection starts spitting out 'ping timeout' error messages and resetting the connection. There has been _no_ error from the target side nor the connection, so it's us, sadly.
However, after some heavy instrumenting I found this: [ 2664.456270] connection2:0: Sending nopout exp 1913193 max 1913177 queued 1913194 [ 2664.462592] connection2:0: mgmtpdu [itt xa05 p ffff8100795675c0] queued [ 2584.459166] connection2:0: mgmtpdu [op 0x0 hdr->itt 0xa05 datalen 0] [ 2584.466259] connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] done [ 2585.232044] connection2:0: mgmtpdu [op 0x2 hdr->itt 0xa06 datalen 0] [ 2585.235410] connection2:0: mgmtpdu [itt 0xa06 p ffff810079567540] done [ 2666.376858] connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194 [ 2667.169094] connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194 [ 2667.179683] connection2:0: Sending nopout,cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194 [ 2669.092002] connection2:0: mgmtpdu [itt 0xa05 p ffff8100795675c0] delayed, cmd 0 mgmt 0 req 0 exp 1913193 max 1913177 queued 1913194 [ 2669.093935] connection2:0: ping timeout of 5 secs expired, stat 7880612636/308802796/2 state 1/-150303960 [ 2669.097568] connection2:0: iscsi: detected conn error (1011) [ 2669.201246] session2: blocking session [ 2669.206148] connection2:0: mgmtpdu [itt xa05 p ffff8100795675c0] finished (ignore the clock skew, it's a multicore TSC problem ...). The numbers are ExpCmdSN, MaxCmdSN, and queued CmdSN. So, as you can see, we're for some reason sending a NOP-Out with CmdSN 1913194, ExpCmdSN 1913193, and MaxCmdSN 191377. But reading through RFC 3720, I found this: The target MUST NOT transmit a MaxCmdSN that is less than ExpCmdSN-1. For non-immediate commands, the CmdSN field can take any value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently ignore any non-immediate command outside of this range or non- immediate duplicates within the range. The CmdSN carried by immediate commands may lie outside the ExpCmdSN to MaxCmdSN range. For example, if the initiator has previously sent a non-immediate command carrying the CmdSN equal to MaxCmdSN, the target window is closed. For group task management commands issued as immediate commands, CmdSN indicates the scope of the group action (e.g., on ABORT TASK SET indicates which commands are aborted). So no wonder we're never seeing any replies to that one. Question remains, though, why we're starting to send PDUs with invalid MaxCmdSN numbers ... Apologies as I'm running on an older codebase. But the same error / 'behaviour' is present in mainline as well. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---