Hi! I would guess that your network/storage system is overloaded occasionally, and then network packets are significantly (>5s) delayed. It sounds unlikely, but that would explain a duplicate ACK IMHO.
Here with SLES10 SP2 on x86_64 with a HP EVA 6000 + iSCSI connectivity option (MPX100) there is no such problem, even though there are about 20 hosts connected to the storage system, most of them directly through a fabric switch. However the iSCSI load is light. Here is what one of the two MPX boxes reports (the load will rise; it's still early morning): #> show perf byte WARNING: Valid data is only displayed for port(s) that are not associated with any configured FCIP routes. Displaying bytes/sec (total)... (Press any key to stop display) GE1 GE2 FC1 FC2 -------------------------------- 53K 606K 43K 620K 774 774 1K 1K 651K 868 17K 636K 2K 26K 18K 10K 49K 0 22K 26K 55K 805K 799K 66K 29K 17K 40K 7K 597K 7K 534K 72K 172 196K 150K 47K 49K 516 25K 28K 294K 43K 29K 313K 774 774 1K 1K 228K 868 65K 171K 18K 20K 528 43K 6K 6K 0 12K 145K 516 101K 45K 15K 29K 15K 30K (GEx are the gigabit Ethernet ports, FCx are the FibreChannel ports that are connected with the storage system via a fabric switch) Some more statistics (for those who care or like inspiration): #> show stats FC Port Statistics -------------------- FC Port 1 Interrupt Count 82812963 Target Command Count 0 Initiator Command Count 83526964 Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Primitive Sequence Error Count 0 Invalid Transmission Word Count 0 Invalid CRC Error Count 0 FC Port 2 Interrupt Count 81551444 Target Command Count 0 Initiator Command Count 82221649 Link Failure Count 0 Loss of Sync Count 0 Loss of Signal Count 0 Primitive Sequence Error Count 0 Invalid Transmission Word Count 0 Invalid CRC Error Count 0 iSCSI Port Statistics ----------------------- iSCSI Port 1 Interrupt Count 190227478 Target Command Count 100974368 Initiator Command Count 0 MAC Xmit Frames 122882662 MAC Xmit Byte Count 28445475500 MAC Xmit Multicast Frames 0 MAC Xmit Broadcast Frames 0 MAC Xmit Pause Frames 0 MAC Xmit Control Frames 0 MAC Xmit Deferrals 0 MAC Xmit Late Collisions 0 MAC Xmit Aborted 0 MAC Xmit Single Collisions 0 MAC Xmit Multiple Collisions 0 MAC Xmit Collisions 0 MAC Xmit Dropped Frames 0 MAC Xmit Jumbo Frames 1702770 MAC Rcvd Frames 156585933 MAC Rcvd Byte Count 187107689124 MAC Rcvd Unknown Control Frames 0 MAC Rcvd Pause Frames 0 MAC Rcvd Control Frames 0 MAC Rcvd Dribbles 0 MAC Rcvd Frame Length Errors 0 MAC Rcvd Jabbers 0 MAC Rcvd Carrier Sense Errors 0 MAC Rcvd Dropped Frames 0 MAC Rcvd CRC Errors 0 MAC Rcvd Encoding Errors 0 MAC Rcvd Length Errors Large 0 MAC Rcvd Length Errors Small 0 MAC Rcvd Multicast Frames 178239 MAC Rcvd Broadcast Frames 48 iSCSI Port 2 Interrupt Count 182729151 Target Command Count 97555857 Initiator Command Count 0 MAC Xmit Frames 120067355 MAC Xmit Byte Count 27948414986 MAC Xmit Multicast Frames 0 MAC Xmit Broadcast Frames 0 MAC Xmit Pause Frames 0 MAC Xmit Control Frames 0 MAC Xmit Deferrals 0 MAC Xmit Late Collisions 0 MAC Xmit Aborted 0 MAC Xmit Single Collisions 0 MAC Xmit Multiple Collisions 0 MAC Xmit Collisions 0 MAC Xmit Dropped Frames 0 MAC Xmit Jumbo Frames 1718670 MAC Rcvd Frames 156126093 MAC Rcvd Byte Count 196255090216 MAC Rcvd Unknown Control Frames 0 MAC Rcvd Pause Frames 0 MAC Rcvd Control Frames 0 MAC Rcvd Dribbles 0 MAC Rcvd Frame Length Errors 0 MAC Rcvd Jabbers 0 MAC Rcvd Carrier Sense Errors 0 MAC Rcvd Dropped Frames 0 MAC Rcvd CRC Errors 0 MAC Rcvd Encoding Errors 0 MAC Rcvd Length Errors Large 0 MAC Rcvd Length Errors Small 0 MAC Rcvd Multicast Frames 178236 MAC Rcvd Broadcast Frames 34 iSCSI Shared Statistics ----------------------- PDUs Xmited 324033312 Data Bytes Xmited 29836097572 PDUs Rcvd 198783508 Data Bytes Rcvd 35739418624 I/O Completed 165710975 Unexpected I/O Rcvd 0 iSCSI Format Errors 0 Header Digest Errors 0 Data Digest Errors 0 Sequence Errors 0 IP Xmit Packets 242949995 IP Xmit Byte Count 47161789220 IP Xmit Fragments 0 IP Rcvd Packets 312354406 IP Rcvd Byte Count 371426357904 IP Rcvd Fragments 0 IP Datagram Reassembly Count 0 IP Error Packets 0 IP Fragment Rcvd Overlap 0 IP Fragment Rcvd Out of Order 0 IP Datagram Reassembly Timeouts 0 TCP Xmit Segment Count 242949995 TCP Xmit Byte Count 38654705673 TCP Rcvd Segment Count 312354406 TCP Rcvd Byte Count 361430272728 TCP Persist Timer Expirations 0 TCP Rxmit Timer Expired 0 TCP Rcvd Duplicate Acks 644 TCP Rcvd Pure Acks 4091830 TCP Xmit Delayed Acks 13648891 TCP Xmit Pure Acks 31445514 TCP Rcvd Segment Errors 101 TCP Rcvd Segment Out of Order 306 TCP Rcvd Window Probes 0 TCP Rcvd Window Updates 0 TCP ECC Error Corections 0 Regards, Ulrich On 2 Nov 2009 at 19:16, Santi Saez wrote: > > > Hi, > > Randomly we get Open-iSCSI "conn errors" when connecting to an > Infortrend A16E-G2130-4 storage array. We had discussed about this > earlier in the list, see: > > http://tr.im/DVQm > http://tr.im/DVQp > > Open-iSCSI logs this: > > =============================================== > Nov 2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx > 408250499, last ping 408249467, now 408254467 > Nov 2 18:34:02 vz-17 kernel: connection1:0: iscsi: detected conn > error (1011) > Nov 2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > Nov 2 18:34:07 vz-17 iscsid: connection1:0 is operational after > recovery (1 attempts) > Nov 2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx > 408294833, last ping 408299833, now 408304833 > Nov 2 18:34:52 vz-17 kernel: connection1:0: iscsi: detected conn > error (1011) > Nov 2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > Nov 2 18:34:57 vz-17 iscsid: connection1:0 is operational after > recovery (1 attempts) > =============================================== > > Running on CentOS 5.4 with "iscsi-initiator-utils-6.2.0.871-0.10.el5"; > I think it's not a Open-iSCSI bug as Mike suggested at: > > http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f > > I have only this error when connecting to Infortrend storage, and not > with NetApp, Nexsan, etc. *connected in the same SAN*. > > Using Wireshark I see a lot of "TCP Dup ACK", "TCP ACKed lost > segment", etc. and iSCSI session finally ends in timeout, see a > screenshot here: > > http://tinyurl.com/ykpvckn > > Using Wireshark IO graphs I get this strange report about TCP/IP errors: > > http://tinyurl.com/ybm4m8x > > And this is another report in the same SAN connecting to a NetApp: > > http://tinyurl.com/ycgc8ul > > Those TCP/IP errors only occurs when connecting to Infortrend > storage.. and no with other targets in the same SAN (using same switch > infrastructure); is there anyway to deal with this using Open-iSCSI? > As I see in Internet, there're a lot of Infortrend's users suffering > this behavior. > > Thanks! > > P.D: speed and duplex configuration is correct in all point, there > aren't CRC errors in the switch. > > -- > Santi Saez > http://woop.es > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---