Hi All, I have a nasty problem with open-iscsi on SLES10 + an Infortrend iSCSI array.
Basically it looks like everything goes wrong as soon as the read/ write load becomes heavy, although network dumps suggest the problem is always there, it just goes critical when the load is too heavy. My setup: 1x HP DL585 - SLES10 x86_64 1x HP DL585 - RHEL4 x86_64 1x HP DL380 - SLES10 i586 2x Cisco 2960G (gigabit) switches 2x Infortrend A16E-G2130-4 with 16x 1TB disks each The two Infortrend arrays have all their gigabit ethernet ports plugged into one of the cisco switches, then we have 2 fibre connections leading to the other cisco switch which has the three servers plugged into it. The network is completely isolated from our other company networks. At first I thought it was a network problem, so we replaced our dodgy Netgear switches with quality Cisco networking gear, but the problem is the same, if anything it's worse because the Cisco switches facilitate higher bandwidth (extra ~20mb/s) and the errors seem to be more reliably producible. None of the linux ethernet statistics report any errors (ifconfig) and the cisco switches don't report any packet errors either. The Infortrend arrays don't provide ethernet statistics. Wireshark (ethereal) shows many errors - clusters of Duplicate ACKs, and a few "previous segment lost". dmesg output from the SLES10 x86_64 server: sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 1945613312 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 3866132584 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 1565827648 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429620296 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429619272 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429618248 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429617224 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429616200 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429615176 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429614152 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 429613128 Buffer I/O error on device dm-9, logical block 53701593 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701594 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701595 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701596 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701597 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701598 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701599 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701600 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701601 lost page write due to I/O error on dm-9 Buffer I/O error on device dm-9, logical block 53701602 lost page write due to I/O error on dm-9 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 2972647720 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 2717078440 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 1566942880 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 1023998416 I/O error in filesystem ("dm-9") meta-data dev dm-9 block 0x3d08f850 ("xfs_trans_read_buf") error 5 buf count 8192 sd 11:0:0:4: SCSI error: return code = 0x00020000 end_request: I/O error, dev sdi, sector 2048020038 I/O error in filesystem ("dm-9") meta-data dev dm-9 block 0x7a124cc6 ("xlog_iodone") error 5 buf count 1024 xfs_force_shutdown(dm-9,0x2) called from line 960 of file fs/xfs/ xfs_log.c. Return address = 0xffffffff882913aa Filesystem "dm-9": Log I/O Error Detected. Shutting down filesystem: dm-9 Please umount the filesystem, and rectify the problem(s) xfs_force_shutdown(dm-9,0x1) called from line 424 of file fs/xfs/ xfs_rw.c. Return address = 0xffffffff882a5139 xfs_force_shutdown(dm-9,0x1) called from line 424 of file fs/xfs/ xfs_rw.c. Return address = 0xffffffff882a5139 /var/log/messages from sles10 i586 server: kernel: connection2:0: iscsi: detected conn error (1011) iscsid: detected iSCSI connection 2:0 error (1011) state (3) iscsid: connect failed (113) iscsid: connect failed (113) iscsid: connect failed (113) kernel: session2: iscsi: session recovery timed out after 120 secs kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: scsi: Device offlined - not ready after error recovery kernel: sd 2:0:0:0: SCSI error: return code = 0x00020000 kernel: end_request: I/O error, dev sda, sector 768252872 kernel: sd 2:0:0:0: rejecting I/O to offline device Any help would be much appreciated !!! Thanks, Stuart. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~----------~----~----~----~------~----~------~--~---