I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as 
dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is 
iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.



Whenever the equallogic rebalances the LUNs between the controllers/ports, it 
requests the initiator to logout and login again to the new port/ip. If the 
guests are idle, the following messages show up in the logs:



Aug  3 17:55:08 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:09 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



However, if one of the RHEL guests is busy performing IO, we end up having a 
few failed requests as well:



Aug  3 17:55:26 goncalog140 kernel:  connection1:0: dropping R2T itt 55 in 
recovery.

Aug  3 17:55:26 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
533399

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
5337 51

Aug  3 17:55:27 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



And as a side effect, the guest filesystem goes read-only. Googling around, 
I've found the following thread on this list which covers the same error I'm 
seeing in the logs:



http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gst&q=conn+error#8e95febb6cf79f64



I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike 
Christie taken from that thread which can be found in the link below:



http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2&view=1



Is this a known issue? Is there anything else from a troubleshooting 
perspective that I could do?



I've uploaded the following files, in case someone would like to take a look:



Tcpdump's collected a couple of days ago in another reproduction/analysis of 
the same bug (apologies, but I didn't get around to collect new tcp dumps with 
today's reproduction):



0tcpdump0947.pcap       162K  - 09:47 (GMT+1) nothing occurred.

1tcpdump0952.pcap       4.8M  - 09:52 (GMT+2) problem occurred



Logs from today's reproduction of the issue with the patched drivers for 
additional backtracing:



vm-boot.txt                        2.7K After VM creation

vm-lun-rebalance-no-effect.txt     3.1K VM is idling, FS does not become 
read-only.

vm-lun-rebalance-fs-readonly.txt   3.3K VM is dd'ing /dev/zero to iscsi based 
disk, FS becomes read-only.

guest-dmesg.txt                    14K  RHEL 5.3 with 2.6.18-194.8.1.el5xen 
(RHEL 5.5 kernel)



All these files can be found in the following link:



http://promisc.org/iscsi/



Any help would be greatly appreciated!



Cheers,

 -Goncalo.




-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to