Re: connection1:0: detected conn error (1011)
Thank you, i have 6 eth ports on the initiator and 6 on the target. the IP of each eth port is 192.168.1.x,192.168.2.x,192.168.3.x.192.168.6.x and each port on the target links one corresponding port on the initiator does this matter? 在 2014年7月24日星期四UTC+8上午4时02分11秒,Mike Christie写道: Is this the same setup where you had multiple initiator nic ports and iscsi target portals on the same subnet? If so, then check the networking. Can you ping -I ethX to the iscsi target portal? If you run tcpdump/wireshark while doing the read test, do you see IO going through the correct ports. What target is this with? On 07/22/2014 09:42 PM, 木木夕 wrote: hello everyone, the iscsi initiator can login the iscsi target successfully, everything looks well but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it will print connection1:0: detected conn error (1011) it happened many times any reply will be welcome -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com javascript: mailto:open-iscsi+unsubscr...@googlegroups.com javascript:. To post to this group, send email to open-...@googlegroups.com javascript: mailto:open-...@googlegroups.com javascript:. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: connection1:0: detected conn error (1011)
If they are on the different subnets then it should be ok. It is common to hit network/iscsi setup issues when doing all ports on the same subnet. So send the /var/log/messages of the initiator system. Again, what target are you using? And a tcpdump/wireshark trace would be helpful probably. On 07/24/2014 03:46 AM, 木木夕 wrote: Thank you, i have 6 eth ports on the initiator and 6 on the target. the IP of each eth port is 192.168.1.x,192.168.2.x,192.168.3.x.192.168.6.x and each port on the target links one corresponding port on the initiator does this matter? 在 2014年7月24日星期四UTC+8上午4时02分11秒,Mike Christie写道: Is this the same setup where you had multiple initiator nic ports and iscsi target portals on the same subnet? If so, then check the networking. Can you ping -I ethX to the iscsi target portal? If you run tcpdump/wireshark while doing the read test, do you see IO going through the correct ports. What target is this with? On 07/22/2014 09:42 PM, 木木夕 wrote: hello everyone, the iscsi initiator can login the iscsi target successfully, everything looks well but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it will print connection1:0: detected conn error (1011) it happened many times any reply will be welcome -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+...@googlegroups.com javascript: mailto:open-iscsi+unsubscr...@googlegroups.com javascript:. To post to this group, send email to open-...@googlegroups.com javascript: mailto:open-...@googlegroups.com javascript:. Visit this group at http://groups.google.com/group/open-iscsi http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com mailto:open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com mailto:open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
connection1:0: detected conn error (1011)
hello everyone, the iscsi initiator can login the iscsi target successfully, everything looks well but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it will print connection1:0: detected conn error (1011) it happened many times any reply will be welcome -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: connection1:0: ping timeout of 5 secs expired, recv timeout 5 / connection1:0: detected conn error (1011)
On 12/14/2010 03:12 PM, p...@fhri.org wrote: Hi all... I have four CentOS 5.4 (2.6.18-164.11.1.el5) servers with iscsid version 2.0-871. Two are misbehaving despite identical configuration. They all connect to Enhance Tech RS8-IP4 array the same way, directly NIC-to-NIC without a switch, physically separate from LAN. I created four targets, one per port, and four separate volumes/LUNs. Pasted below is the config and error log. About a minute after a successful login, the timeouts/errors begin and keep coming constantly pretty much every minute whenever the session is logged in, regardless of mount state. The problematic units are also often very slow logging in, mounting, even directory listing at times. Also, they sometimes time out and remount the fs read-only in the middle of a large backup run. There were some fixes to that code in rhel/centos 5.5 kernel, but I do not think that is what you are hitting. Do you see those ping/nop timeout messages even when you are not doing any IO intensive workload? Did you setup your initiator names (/etc/iscsi/initiatorname.iscsi) or did you let the tools do this? Does each server have a unique initiator name or do some servers have the same value in that file? On the target are there any log messsages? If you set node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 (either set that in iscsid.conf then rerun the discovery command and relogin or run iscsiadm -m node -o update -n node.conn[0].timeo.noop_out_interval -v 0 iscsiadm -m node -o update -n node.conn[0].timeo.noop_out_timeout -v 0 then relogin) this will turn off the iscsi nops/pings. Then if run mkfs and do backups, you should not see the ping timeout messages, but do you see low throughout still? Do you still see conn error 1011 messages but just missing the ping timeout messages? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: Re: detected conn error (1011)
Hallo, uns fiel gestern ein Controller am SAN-Storage aus, und open-iscsi auf SLES10 SP3 (open-iscsi-2.0.868-0.6.11) erzeugte _sehr_ viele (419869 Einträge in wenigen Stunden) Fehlermeldungen in Syslog. Gat Novell Bestrebungen, diese zu drosseln? Beispiel: Sep 7 16:08:23 hostname kernel: connection19:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection31:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection20:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection32:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection23:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection7:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection24:0: iscsi: detected conn error (1011) Sep 7 16:08:23 hostname kernel: connection8:0: iscsi: detected conn error (1011) [...] Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01) [...] Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01) Sep 7 16:08:53 hostname kernel: session19: iscsi: session recovery timed out after 30 secs Sep 7 16:08:53 hostname kernel: session31: iscsi: session recovery timed out after 30 secs Sep 7 16:08:53 hostname kernel: session20: iscsi: session recovery timed out after 30 secs [...] Sep 7 16:08:54 hostname kernel: device-mapper: multipath: Failing path 8:192. Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths: 14 Sep 7 16:08:54 hostname multipathd: sdac: tur checker reports path is down Sep 7 16:08:54 hostname multipathd: checker failed path 65:192 in map L116_hostas03 Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths: 13 Sep 7 16:08:54 hostname multipathd: sdv: tur checker reports path is down Sep 7 16:08:54 hostname multipathd: checker failed path 65:80 in map L116_hostas03 Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths: 12 Sep 7 16:08:54 hostname multipathd: sdx: tur checker reports path is down Sep 7 16:08:54 hostname multipathd: checker failed path 65:112 in map L116_hostas03 Sep 7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths: 11 Sep 7 16:08:54 hostname multipathd: sdab: tur checker reports path is down Sep 7 16:08:54 hostname multipathd: checker failed path 65:176 in map L116_hostas03 [...] Sep 7 16:08:54 hostname multipathd
Antw: Re: detected conn error (1011)
Sorry, this message was intended for someone else. E-Mail program tricked me. -- Ulrich Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 08.09.2010 um 13:51 in Nachricht 4c8794da02a1f...@gwsmtp1.uni-regensburg.de: -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
On Thu, Sep 02, 2010 at 03:15:31PM -0700, Shantanu Mehendale wrote: Hi Hannes/Mike, I am also dealing with another issue on ISCSI transport where I am seeing DID_TRASNPORT_FAILFAST hostbyte errors reaching the application which is sending I/O on a device-mapper node. Reading the code a little I thought that after the iscsi replacement_timeout timer fires, the io stuck in the io queues will be sent up to the device-mapper, which would send the io to the new path. Is there a possibility that dm-multipath is not able to handle all the errors so some of them end up going to the application. Basically this is a cable pull kind of experiment where we would expect the path failover to work and io to continue properly. Yes, in general it should. And yes, multipath should handle these cases. But I did quite some patches to iSCSI in SLES11, so you should be making sure you're using the latest maintenance release. Since we already saw one problem with DID_TRANSPORT_DISRUPTED, I was wondering if DID_TRANSPORT_FAILFAST also has some similar issues with limited retries and such. No, that's actually okay. The I/O error will be reported in either case, it's just that it'll never reaches the upper layers. In your case it looks as if the 'tapdisk' thing runs on the raw disks, not the multipathed device. So of course it'll register the error. Maybe it's an idea to have the 'tapdisk' run on the multipath device-mapper device ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
RE: detected conn error (1011)
Hi Hannes, Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 tree of SLES 11. We add a few extra patches specific to Xen, dom0 integration and some backports from upstream. To the best of my knowledge these additions don't touch the iscsi layer, so from the iscsi drivers point of view, I believe they are as pristine as the ones in the SuSE kernel and that's why we need the patch as the binaries probably will mismatch gcc version and/or the versioning that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely appreciate your 'forward thinking' with regards to the issue, though! Thanks, -Goncalo. -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: 30 August 2010 15:12 To: Goncalo Gomes Cc: Mike Christie; open-iscsi@googlegroups.com; Shantanu Mehendale Subject: Re: detected conn error (1011) Goncalo Gomes wrote: Hi, On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: Mike Christie wrote: ccing Hannes from suse, because this looks like a SLES only bug. Hey Hannes, The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. On 08/05/2010 02:21 PM, Goncalo Gomes wrote: I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ It looks like this chunk from libiscsi.c:iscsi_queuecommand: case ISCSI_STATE_FAILED: reason = FAILURE_SESSION_FAILED; sc-result = DID_TRANSPORT_DISRUPTED 16; break; is causing IO errors. You want to use something like DID_IMM_RETRY because it can be a long time between the time the kernel marks the state as ISCSI_STATE_FAILED until we start recovery and properly get all the device queues blocked, so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately. Sadly I got additional bugreports about this so I think I'll have to revert it. I applied and tested the changes Mike Christie suggests. After the LUN is rebalanced within the array I no longer see the IO errors and it appears the setup is now resilient to the equallogic LUN failover process. I'm attaching the log from the dmesg merely for sanity check purposes, if anyone cares to take a look? I have put some test kernels at http://beta.suse.com/private/hare/sles11/iscsi Do the test kernels in the url above contain the change of DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than simply changing the result code? If the latter, would you be able to upload the source rpms or a unified patch containing the changes you are are staging? I'm looking for a more pallatable way to test them, given I have no SLES box lying around, but will install one if needs be. Got me confused. How would you test the patch if not on a SLES box? Presumably you would have to install the new kernel on the instance you are planning to run the test on. Which for any sane setup would have to be a SLES box. In which case you can just use the provided kernel directly and save you the compilation step. Am I missing something? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
Goncalo Gomes wrote: Hi Hannes, Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 tree of SLES 11. We add a few extra patches specific to Xen, dom0 integration and some backports from upstream. To the best of my knowledge these additions don't touch the iscsi layer, so from the iscsi drivers point of view, I believe they are as pristine as the ones in the SuSE kernel and that's why we need the patch as the binaries probably will mismatch gcc version and/or the versioning that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely appreciate your 'forward thinking' with regards to the issue, though! I just checked, and the resulting patch is indeed like you proposed: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 32b30f1..441ca8b 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1336,9 +1336,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)( struct scsi_cmnd *)) */ switch (session-state) { case ISCSI_STATE_FAILED: - reason = FAILURE_SESSION_FAILED; - sc-result = DID_TRANSPORT_DISRUPTED 16; - break; case ISCSI_STATE_IN_RECOVERY: reason = FAILURE_SESSION_IN_RECOVERY; sc-result = DID_IMM_RETRY 16; HTH, Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
RE: detected conn error (1011)
Thanks Hannes and Mike, Your help has been highly appreciated! Cheers, -Goncalo. -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: 31 August 2010 14:43 To: Goncalo Gomes Cc: Mike Christie; open-iscsi@googlegroups.com; Shantanu Mehendale Subject: Re: detected conn error (1011) Goncalo Gomes wrote: Hi Hannes, Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 tree of SLES 11. We add a few extra patches specific to Xen, dom0 integration and some backports from upstream. To the best of my knowledge these additions don't touch the iscsi layer, so from the iscsi drivers point of view, I believe they are as pristine as the ones in the SuSE kernel and that's why we need the patch as the binaries probably will mismatch gcc version and/or the versioning that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely appreciate your 'forward thinking' with regards to the issue, though! I just checked, and the resulting patch is indeed like you proposed: diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 32b30f1..441ca8b 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1336,9 +1336,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)( struct scsi_cmnd *)) */ switch (session-state) { case ISCSI_STATE_FAILED: - reason = FAILURE_SESSION_FAILED; - sc-result = DID_TRANSPORT_DISRUPTED 16; - break; case ISCSI_STATE_IN_RECOVERY: reason = FAILURE_SESSION_IN_RECOVERY; sc-result = DID_IMM_RETRY 16; HTH, Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
Goncalo Gomes wrote: Hi, On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: Mike Christie wrote: ccing Hannes from suse, because this looks like a SLES only bug. Hey Hannes, The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. On 08/05/2010 02:21 PM, Goncalo Gomes wrote: I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ It looks like this chunk from libiscsi.c:iscsi_queuecommand: case ISCSI_STATE_FAILED: reason = FAILURE_SESSION_FAILED; sc-result = DID_TRANSPORT_DISRUPTED 16; break; is causing IO errors. You want to use something like DID_IMM_RETRY because it can be a long time between the time the kernel marks the state as ISCSI_STATE_FAILED until we start recovery and properly get all the device queues blocked, so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately. Sadly I got additional bugreports about this so I think I'll have to revert it. I applied and tested the changes Mike Christie suggests. After the LUN is rebalanced within the array I no longer see the IO errors and it appears the setup is now resilient to the equallogic LUN failover process. I'm attaching the log from the dmesg merely for sanity check purposes, if anyone cares to take a look? I have put some test kernels at http://beta.suse.com/private/hare/sles11/iscsi Do the test kernels in the url above contain the change of DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than simply changing the result code? If the latter, would you be able to upload the source rpms or a unified patch containing the changes you are are staging? I'm looking for a more pallatable way to test them, given I have no SLES box lying around, but will install one if needs be. Got me confused. How would you test the patch if not on a SLES box? Presumably you would have to install the new kernel on the instance you are planning to run the test on. Which for any sane setup would have to be a SLES box. In which case you can just use the provided kernel directly and save you the compilation step. Am I missing something? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
Hi, On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: Mike Christie wrote: ccing Hannes from suse, because this looks like a SLES only bug. Hey Hannes, The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. On 08/05/2010 02:21 PM, Goncalo Gomes wrote: I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ It looks like this chunk from libiscsi.c:iscsi_queuecommand: case ISCSI_STATE_FAILED: reason = FAILURE_SESSION_FAILED; sc-result = DID_TRANSPORT_DISRUPTED 16; break; is causing IO errors. You want to use something like DID_IMM_RETRY because it can be a long time between the time the kernel marks the state as ISCSI_STATE_FAILED until we start recovery and properly get all the device queues blocked, so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately. Sadly I got additional bugreports about this so I think I'll have to revert it. I applied and tested the changes Mike Christie suggests. After the LUN is rebalanced within the array I no longer see the IO errors and it appears the setup is now resilient to the equallogic LUN failover process. I'm attaching the log from the dmesg merely for sanity check purposes, if anyone cares to take a look? I have put some test kernels at http://beta.suse.com/private/hare/sles11/iscsi Do the test kernels in the url above contain the change of DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than simply changing the result code? If the latter, would you be able to upload the source rpms or a unified patch containing the changes you are are staging? I'm looking for a more pallatable way to test them, given I have no SLES box lying around, but will install one if needs be. Thanks, -Goncalo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. device-mapper: multipath: version 1.0.5 loaded device-mapper: multipath round-robin: version 1.0.0 loaded device-mapper: table: 251:1: multipath: error getting device device-mapper: ioctl: error adding target to table device-mapper: table: 251:1: multipath: error getting device device-mapper: ioctl: error adding target to table Citrix Systems, Inc. -- Private Release Kernel Private File Disclaimer The private files provided to you contain a preliminary code fix. These private files have been created and distributed to you to address your specific issue and provide Citrix with the feedback that your issue has been resolved or to provide further debugging information. These private files have had minimal in-house testing with no regression testing and may contain defects. These private file(s) will only be supported until an official Hotfix has been provided or one is publicly available from the Citrix web site. Any private files that are provided to you are intended only for the use of the individual or entity to which this is addressed and distribution of these files or utilities is prohibited. CITRIX MAKES NO REPRESENTATIONS OR WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE WITH RESPECT TO THE PRIVATE FILES. THE PRIVATE FILES ARE DELIVERED ON AN AS IS BASIS. YOU SHALL HAVE THE SOLE RESPONSIBILITY FOR ADEQUATE PROTECTION AND BACK-UP OF AN6Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) scsi6 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) scsi 6:0:0:0: Direct-Access EQLOGIC 100E-00 4.3 PQ: 0 ANSI: 5 sd 6:0:0:0: [sdb] 209725440 512-byte hardware sectors: (107 GB/100 GiB) sd 6:0:0:0: [sdb] Write Protect is off sd 6:0:0:0: [sdb] Mode Sense: ad 00 00 00 sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sd 6:0:0:0: [sdb] 209725440 512-byte hardware sectors: (107 GB/100 GiB) sd 6:0:0:0: [sdb] Write Protect is off sd 6:0:0:0: [sdb] Mode Sense: ad 00 00 00 sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sdb: sdb1 sd 6:0:0:0: [sdb] Attached SCSI disk sd 6:0:0:0: Attached scsi generic sg1 type 0 tap_backend_changed: backend/tap/1/51712: created thread 9531 tap_blkif_schedule[9531]: starting device
RE: detected conn error (1011)
Hi Hannes, Would you be able to send me a unified patch containing the changes included in the test kernels so I can rebuild the drivers with them and update you today? For completeness, we are not running SLES, but rather the Citrix XenServer 5.6 release which is based off of the Linux 2.6.27 tree of SLES. Also, for this specific controller we don't enable MPIO, but in most other arrays we do. Thanks, -Goncalo. -Original Message- From: Hannes Reinecke [mailto:h...@suse.de] Sent: 06 August 2010 15:58 To: Mike Christie Cc: open-iscsi@googlegroups.com; Goncalo Gomes Subject: Re: detected conn error (1011) Mike Christie wrote: ccing Hannes from suse, because this looks like a SLES only bug. Hey Hannes, The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. On 08/05/2010 02:21 PM, Goncalo Gomes wrote: I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ It looks like this chunk from libiscsi.c:iscsi_queuecommand: case ISCSI_STATE_FAILED: reason = FAILURE_SESSION_FAILED; sc-result = DID_TRANSPORT_DISRUPTED 16; break; is causing IO errors. You want to use something like DID_IMM_RETRY because it can be a long time between the time the kernel marks the state as ISCSI_STATE_FAILED until we start recovery and properly get all the device queues blocked, so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately. Sadly I got additional bugreports about this so I think I'll have to revert it. I have put some test kernels at http://beta.suse.com/private/hare/sles11/iscsi Can you test with them and check if this issue is solved? Thanks. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
On 08/06/2010 09:57 AM, Hannes Reinecke wrote: Mike Christie wrote: ccing Hannes from suse, because this looks like a SLES only bug. Hey Hannes, The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. On 08/05/2010 02:21 PM, Goncalo Gomes wrote: I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ It looks like this chunk from libiscsi.c:iscsi_queuecommand: case ISCSI_STATE_FAILED: reason = FAILURE_SESSION_FAILED; sc-result = DID_TRANSPORT_DISRUPTED 16; break; is causing IO errors. You want to use something like DID_IMM_RETRY because it can be a long time between the time the kernel marks the state as ISCSI_STATE_FAILED until we start recovery and properly get all the device queues blocked, so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED. Yeah, I noticed. But the problem is that multipathing will stall during this time, ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED will circumvent this and we can failover immediately. It should stall, It works like FC and the fast io fail tmo. Users need to set the iscsi replacement/recovery timeout like they would FC's fast io fail tmo. They should set it to 3 or 5 secs or lower if they want really fast failovers. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
On 08/06/2010 11:38 AM, Mike Christie wrote: It should stall, It works like FC and the fast io fail tmo. Users need to set the iscsi replacement/recovery timeout like they would FC's fast io fail tmo. They should set it to 3 or 5 secs or lower if they want really fast failovers. Oh yeah, Qlogic recently did this patch: http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=fe4f0bdeea788a8ac049c097895cb2e4044f18b1;hp=caf19d38607108304cd8cc67ed21378017f69e8a so we can have multipath tools set the recovery_tmo like it does for fast io fail tmo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: detected conn error (1011)
Goncalo Gomes goncalo.go...@eu.citrix.com schrieb am 04.08.2010 um 23:12 in Nachricht ffdb98dc9661d3418b9eb0ff5202e46b7a82e8b...@lonpmailbox01.citrite.net: I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. I guess that's SLES11 already. I just read an announcement that there is an opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not give any details in the announcements: 4. Recommended update for open-iscsi SUSE Linux Enterprise Desktop 11 SP1 for x86-64 http://download.novell.com/Download?buildid=MAugs_l2FJY~ SUSE Linux Enterprise Desktop 11 SP1 for x86 http://download.novell.com/Download?buildid=U2OyI_9oJ5g~ SUSE Linux Enterprise Server 11 SP1 for x86-64 http://download.novell.com/Download?buildid=1Z1WASv0lfE~ SUSE Linux Enterprise Server 11 SP1 for x86 http://download.novell.com/Download?buildid=EzU17PIvOTc~ SUSE Linux Enterprise Server 11 SP1 for s390x http://download.novell.com/Download?buildid=xqwCozVDBjM~ SUSE Linux Enterprise Server 11 SP1 for ppc http://download.novell.com/Download?buildid=fMD_W5XKEtI~ SUSE Linux Enterprise Server 11 SP1 for ia64 http://download.novell.com/Download?buildid=_QVtGS0824o~ Maybe you should try the latest (and greatest?) version... ;-) Regards, Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Antw: detected conn error (1011)
On Thu, 2010-08-05 at 08:50 +0200, Ulrich Windl wrote: I guess that's SLES11 already. I just read an announcement that there is an opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not give any details in the announcements: 4. Recommended update for open-iscsi SUSE Linux Enterprise Desktop 11 SP1 for x86-64 http://download.novell.com/Download?buildid=MAugs_l2FJY~ SUSE Linux Enterprise Desktop 11 SP1 for x86 http://download.novell.com/Download?buildid=U2OyI_9oJ5g~ SUSE Linux Enterprise Server 11 SP1 for x86-64 http://download.novell.com/Download?buildid=1Z1WASv0lfE~ SUSE Linux Enterprise Server 11 SP1 for x86 http://download.novell.com/Download?buildid=EzU17PIvOTc~ SUSE Linux Enterprise Server 11 SP1 for s390x http://download.novell.com/Download?buildid=xqwCozVDBjM~ SUSE Linux Enterprise Server 11 SP1 for ppc http://download.novell.com/Download?buildid=fMD_W5XKEtI~ SUSE Linux Enterprise Server 11 SP1 for ia64 http://download.novell.com/Download?buildid=_QVtGS0824o~ Maybe you should try the latest (and greatest?) version... ;-) Fron the description: * Occasionally, not all iSCSI multipath mapping are being created after boot up * Stopping of the open-iscsi service fails even if no iSCSI device is mounted. * When configuring iBFT, the iscsiadm program does not display the details of a session. Do you think any of the fixes above may help in the issue I described before? I'm presently not making use of multipath nor booting from SAN. Although, these fixes are worth having, I'm mostly concerned about understanding the nature/reason of the issue I described before at this stage. Thanks, -Goncalo. Regards, Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
On Wed, 2010-08-04 at 21:51 -0500, Mike Christie wrote: conn error 1011 is generic. If this is occurring when the eql box is rebalancing luns, it is a little different than above. With the above problem we did not know why we got the error. With your situation we sort of expect this. We should not be getting disk IO errors though. When we get the logout request from the target, we send the logout request, then basically handle the cleanup like if we got a connection error. That is why you would see the conn error msg in this path. This also means if this happened to the same IO 5 times, then you would see the disk IO errors (scsi layer only lets us retry disk IO 5 times). But if it just happened once, then the IO should be retried when we log into the new portal and execute like normal. What would be the best way to I identify how many retries have elapsed? Or are you using dm-multipath over iscsi? In that case you do not get any retries, so we would expect to see that end_request: I/O error message, but dm-multipath should just be retrying a new path or internally queueing for whatever timeout value you had it use in multipath.conf. Multipath is not enabled at all. The equallogic array is active/passive and we only have a view into one controller at any time, so we don't make use of multipath at present. Could you send me the libiscsi.c file you patched? Could you also send more of the log for either case? I want to see the iscsid log info and any more of the kernel iscsi log info that you have. I am looking for session recovery timed out messages and/or target requested logout messages. I've copied both the messages file from the host goncalog140 and the patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these files in the link below: http://promisc.org/iscsi/ N.B: the messages file contains spew from other instrumentation tests (e.g a dump_stack() call in scsi_transport_iscsi.c::iscsi_conn_error()). The last set of tests which I've made available yesterday have only the libiscsi.c and IIRC the iscsi_tcp.c, and this output can be found around the timeframe of 17:50. If required I can spin a new set of tests with different instrumentation and/or collect different information, logs or tcpdumps, if that helps in any way. Thanks, -Goncalo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
detected conn error (1011)
I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. Whenever the equallogic rebalances the LUNs between the controllers/ports, it requests the initiator to logout and login again to the new port/ip. If the guests are idle, the following messages show up in the logs: Aug 3 17:55:08 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:09 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) However, if one of the RHEL guests is busy performing IO, we end up having a few failed requests as well: Aug 3 17:55:26 goncalog140 kernel: connection1:0: dropping R2T itt 55 in recovery. Aug 3 17:55:26 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 533399 Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 5337 51 Aug 3 17:55:27 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) And as a side effect, the guest filesystem goes read-only. Googling around, I've found the following thread on this list which covers the same error I'm seeing in the logs: http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gstq=conn+error#8e95febb6cf79f64 I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike Christie taken from that thread which can be found in the link below: http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2view=1 Is this a known issue? Is there anything else from a troubleshooting perspective that I could do? I've uploaded the following files, in case someone would like to take a look: Tcpdump's collected a couple of days ago in another reproduction/analysis of the same bug (apologies, but I didn't get around to collect new tcp dumps with today's reproduction): 0tcpdump0947.pcap 162K - 09:47 (GMT+1) nothing occurred. 1tcpdump0952.pcap 4.8M - 09:52 (GMT+2) problem occurred Logs from today's reproduction of the issue with the patched drivers for additional backtracing: vm-boot.txt2.7K After VM creation vm-lun-rebalance-no-effect.txt 3.1K VM is idling, FS does not become read-only. vm-lun-rebalance-fs-readonly.txt 3.3K VM is dd'ing /dev/zero to iscsi based disk, FS becomes read-only. guest-dmesg.txt14K RHEL 5.3 with 2.6.18-194.8.1.el5xen (RHEL 5.5 kernel) All these files can be found in the following link: http://promisc.org/iscsi/ Any help would be greatly appreciated! Cheers, -Goncalo. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: detected conn error (1011)
On 08/04/2010 04:12 PM, Goncalo Gomes wrote: I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array. Whenever the equallogic rebalances the LUNs between the controllers/ports, it requests the initiator to logout and login again to the new port/ip. If the guests are idle, the following messages show up in the logs: Aug 3 17:55:08 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:09 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) However, if one of the RHEL guests is busy performing IO, we end up having a few failed requests as well: Aug 3 17:55:26 goncalog140 kernel: connection1:0: dropping R2T itt 55 in recovery. Aug 3 17:55:26 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 533399 Aug 3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK Aug 3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 5337 51 Aug 3 17:55:27 goncalog140 kernel: connection1:0: detected conn error (1011) Aug 3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery (1 attempts) And as a side effect, the guest filesystem goes read-only. Googling around, I've found the following thread on this list which covers the same error I'm seeing in the logs: http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gstq=conn+error#8e95febb6cf79f64 conn error 1011 is generic. If this is occurring when the eql box is rebalancing luns, it is a little different than above. With the above problem we did not know why we got the error. With your situation we sort of expect this. We should not be getting disk IO errors though. When we get the logout request from the target, we send the logout request, then basically handle the cleanup like if we got a connection error. That is why you would see the conn error msg in this path. This also means if this happened to the same IO 5 times, then you would see the disk IO errors (scsi layer only lets us retry disk IO 5 times). But if it just happened once, then the IO should be retried when we log into the new portal and execute like normal. Or are you using dm-multipath over iscsi? In that case you do not get any retries, so we would expect to see that end_request: I/O error message, but dm-multipath should just be retrying a new path or internally queueing for whatever timeout value you had it use in multipath.conf. I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike Christie taken from that thread which can be found in the link below: http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2view=1 Could you send me the libiscsi.c file you patched? Could you also send more of the log for either case? I want to see the iscsid log info and any more of the kernel iscsi log info that you have. I am looking for session recovery timed out messages and/or target requested logout messages. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 28.07.2010 um 16:46 in Nachricht 4c505ef502a1e...@gwsmtp1.uni-regensburg.de : Sean S sstra...@gmail.com schrieb am 28.07.2010 um 16:34 in Nachricht f711789b-c411-4459-afbe-fe1a50fe2...@w12g2000yqj.googlegroups. com: How did you get those other kernel messages? If you can just get the iscsid log info that is sent after lines like this I'm able to issue the dmesg command after the drive is lost and still retrieve some logging info. Unfortunately, what I sent was all that I can get. If the drive ever successfully reconnects then I can get to /var/log/messages and see the info you are looking for. I've only ever had a successful reconnect when intentionally causing a disconnect (i.e. pulling the ethernet cable and then reconnecting it). I don't know much about unix logging, but maybe there is a way to send more of the logging messages to dmesg as that doesn't appear to need disk access to be read. dmesg just print the kernel message buffer (/proc/kmsg), while syslog can capture messages from applications as well. I have a sample for a syslog-ng configuration file: Samples for sources are missing, sorry: source s_intern { internal(); }; source s_dev_log { unix-stream(/dev/log); }; source s_kernel { file(/proc/kmsg); }; destination d_tty_root { usertty(root); }; destination d_console { file(/dev/ttyS0); }; destination d_messages { file(/var/log/messages); }; filter f_error { level(alert .. err) and not match('S15.modem: initchat failed.'); }; filter f_kernel { level(alert .. err); }; filter f_auth { facility(auth, authpriv) and level(alert .. info); }; filter f_debug { level(alert .. debug); }; # send criticals messages to logged root user and /var/log/messages log { source(s_intern); source(s_dev_log); source(s_kernel); filter(f_error); destination(d_tty_root); destination(d_messages); }; # save auth-related messages log { source(s_dev_log); source(s_kernel); filter(f_auth); destination(d_messages); }; ### Just to get you started. The older syslog is less powerful, but easier to configure. Maybe this is interesting for you: # 6) To send message to remote syslogd server : #destination d_udp { udp(remote IP address port(514)); }; #Example to send syslogs to syslogd located at 10.0.0.1 : # destination d_udp1 { udp(10.0.0.1 port(514)); }; Maybe this helps a bit. Regards, Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
On 07/28/2010 09:34 AM, Sean S wrote: What version of open-iscsi-871 are you using is it 871.1 or .2 .3? I downloaded the current semi-stable release: http://www.open-iscsi.org/bits/open-iscsi-2.0-871.tar.gz It doesn't appear to have a minor version number. Should I be using something else? Yeah, try: http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi-2.0-871.3.tar.gz It has a fix for recovery. There was a problem where recovery hung for several minutes. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Sean S sstra...@gmail.com schrieb am 27.07.2010 um 00:37 in Nachricht 109cc690-901c-4e53-9fa9-1a9903380...@l14g2000yql.googlegroups. com: [...] I'm unable to view /var/log/messages after the failure due to running as iscsi root. Ulrich mentioned writing the log to a serial port, but I haven't been able to set this up yet. Would there be an easier way When using GRUB, use something like [...] #serial --unit=0 --speed=19200 terminal serial console [...] and add options vga=normal console=tty0 console=ttyS0,19200 to the kernel command line, just like: ###Don't change this comment - YaST2 identifier: Original name: linux### title SUSE Linux Enterprise Server 10 - 2.6.16.60-0.54.5 (smp) root (hd0,0) kernel /vmlinuz-2.6.16.60-0.54.5-smp root=/dev/system/root vga=normal console=tty0 console=ttyS0,19200 splash=silent showopts initrd /initrd-2.6.16.60-0.54.5-smp Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Thanks for the patch Mike. Below is the output from a failure when running with the patch. Any thoughts? [f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi] [f9bf202d] iscsi_eh_abort+0x2f1/0x406 [libiscsi] [f885d378] __scsi_try_to_abort_cmd+0x19/0x1a [scsi_mod] [f885e85d] scsi_error_handler+0x24d/0x422 [scsi_mod] [c041f7ea] complete+0x2b/0x3d [f885e610] scsi_error_handler+0x0/0x422 [scsi_mod] [c0435f65] kthread+0xc0/0xeb [c0435ea5] kthread+0x0/0xeb [c0405c3b] kernel_thread_helper+0x7/0x10 === connection1:0 detected conn error (1011) [f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi] [f8bf22fc] iscsi_eh_target_reset+0xbb/0x218 [libiscsi] [c0605967] _spin_lock_bh+0x8/0x18 [f8bf0f78] iscsi_eh_device_reset+0x1c5/0x1cf [libiscsi] [c054a6dd] get_device+0xe/0x14 [f885d764] scsi_try_host_reset+0x3a/0x99 [scsi_mod] [f885e0e3] scsi_eh_ready_devs+0x302/0x3e2 [scsi_mod] [f885e8dd] scsi_error_handler+0x2cd/0x422 [scsi_mod] [c041f7ea] complete+0x2b/0x3d [f885e610] scsi_error_handler+0x0/0x422 [scsi_mod] [c0435f65] kthread+0xc0/0xeb [c0435ea5] kthread+0x0/0xeb [c0405c3b] kernel_thread_helper+0x7/0x10 === session1: session recovery timed out after 400 secs sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: SCSI error: return code = 0x0002 end_request: I/O error, dev sda, sector 14283149 On Jul 13, 10:34 pm, Mike Christie micha...@cs.wisc.edu wrote: Could you run with the attached patch? It just prints out a little more info. When we get the conn error, it will print out a message if it is due to the target dropping the connection and it will print out stack trace so we can see exactly what piece of code is throwing the error. On 07/13/2010 09:33 PM, Sean S wrote: Nothing else in the log from iscsid. No mention of a failed reconnect, although the only log I'm really able to access post failure is dmesg. Since I'm running a root iscsi, I couldn't get to /var/log/messages which maybe was a little more verbose? What sort of network problems Yeah, by default the iscsid messages go there. iscsid should be spitting out a cannot connect $some_error_value_or_string that would help tell us why we cannot reach the target anymore. might cause this? The network in this situation is a simple gigE switch with about 3 or 4 systems on it. The target and initiator are on the same subnet, nothing fancy. Is there some additional debug you'd recommend turning on? Any tips or tricks when running with a root iscsi drive? Not that I can think of at the iscsi layer. Curiously, if I physically disconnect the ethernet from the initiator while running, all I/O access is correctly paused without returning I/ O errors. If I then reconnect before the 400s is up things go back to normal. I don't however see the detected conn error (1011) message in this situation however. Not sure if that really means anything. You should see the conn error 1011 message if 1. you have nops on and they timeout and that causes us to log that error. 2. the network layer figures out there is a problem and notifies us. It is possible that you pull a cable and plug it back in before the network throws an error. 3. iscsi driver or protocol error. In this case we should relogin quickly. trace-conn-error.patch 1KViewDownload -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
On 07/26/2010 04:36 PM, Sean S wrote: Thanks for the patch Mike. Below is the output from a failure when running with the patch. Any thoughts? [f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi] [f9bf202d] iscsi_eh_abort+0x2f1/0x406 [libiscsi] [f885d378] __scsi_try_to_abort_cmd+0x19/0x1a [scsi_mod] [f885e85d] scsi_error_handler+0x24d/0x422 [scsi_mod] [c041f7ea] complete+0x2b/0x3d [f885e610] scsi_error_handler+0x0/0x422 [scsi_mod] [c0435f65] kthread+0xc0/0xeb [c0435ea5] kthread+0x0/0xeb [c0405c3b] kernel_thread_helper+0x7/0x10 Each scsi command has a timeout (see /sys/block/sdX/device/timeout). The above dump shows that a scsi command is timing out. This causes the scsi layer to have the driver, iscsi_tcp in this case, to try and abort the command. It looks like the abort timed out too, and so the iscsi layer decided to escalate the eh and failed the iscsi session/connection. session1: session recovery timed out after 400 secs The iscsi layer tried to log back in for recovery/replacement timeout seconds, but could not. Did you see anything from iscsid about why it could not log in? iscsid writes to /var/log/messages by default. sd 0:0:0:0: scsi: Device offlined - no ready after error recovery Because the replacement/recovery timeout fired, the iscsi layer decided it was time to give up and tells the scsi layer the disks are not recoverable, and so we these messages: sd 0:0:0:0: scsi: Device offlined - no ready after error recovery Does the session/connection ever re-login (you would see some message in /var/log/messages about connection X:Y is operational after recovery (Z attempts)? On the target box check out /var/log/messages. Is the target even up still? Did it segfault? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Sean S sstra...@gmail.com schrieb am 13.07.2010 um 20:41 in Nachricht 1f2389e7-9717-4f82-a05c-671f36a4c...@x21g2000yqa.googlegroups.com: I'm running an iscsi root partition for a CentOS machine running a 2.6.18-53 kernel. Every couple of days I get the error: connection1:0 detected conn error (1011) session1: session recovery timed out after 400 sec Hi! I cannot answer your question, but that brings up something I wanted to talk about. Please apologize if something already exists, but I don't know: In HP-UX 11.31 you can print scan times per device (i.e. LUN). Here's an example for a true FC-SAN: Class I H/W Path ms_scan_time === lunpath 3 0/3/1/0.0x50001fe1500c1f28.0x0 0 min 0 sec 13 ms lunpath 24 0/3/1/0.0x50001fe1500c1f28.0x4001 0 min 0 sec 88 ms lunpath 73 0/3/1/0.0x50001fe1500c1f28.0x4002 0 min 0 sec 88 ms lunpath 25 0/3/1/0.0x50001fe1500c1f28.0x4003 0 min 0 sec 88 ms lunpath 74 0/3/1/0.0x50001fe1500c1f28.0x4009 0 min 0 sec 88 ms lunpath 26 0/3/1/0.0x50001fe1500c1f28.0x4033 0 min 0 sec 88 ms lunpath 88 0/3/1/0.0x50001fe1500c1f28.0x4037 0 min 0 sec 88 ms lunpath 79 0/3/1/0.0x50001fe1500c1f28.0x403d 0 min 0 sec 91 ms lunpath 27 0/3/1/0.0x50001fe1500c1f28.0x4047 0 min 0 sec 91 ms [...] lunpath 63 0/7/1/0.0x500308c001d83803.0x4001 0 min 0 sec 11 ms lunpath 64 0/7/1/0.0x500308c001d83803.0x4002 0 min 0 sec 11 ms lunpath 65 0/7/1/0.0x500308c001d83803.0x4003 0 min 0 sec 11 ms lunpath 66 0/7/1/0.0x500308c001d83803.0x4004 0 min 0 sec 536 ms If Linux/open-iscsi had something similar, one could periodically watch the times to find bottlenecks. AFAIK, the scan time in HP-UX is the round-trip delay for querying a LUN or a controller (a target?). Ulrich I compiled the open-iscsi 2.0-871 user tools and kernel modules from source obtained from open-iscsi.org. I custom packaged the initrd to contain the iscsistart binary and the kernel modules from v871. I've zeroed out the noop timeout setting and the noop interval. The disconnect is not reproducible, but does occur at random about every other day. I'm assuming that the target (IET 1.4.19) is not the issue as a second system that is using the target as an iscsi-root drive continues to work correctly. What things should I be looking at? I'm really struggling to understand why this happens, any suggestions would be greatly appreciated. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Sean S sstra...@gmail.com schrieb am 14.07.2010 um 04:33 in Nachricht 83cd8c40-2e84-4c52-a864-36643dd0a...@d8g2000yqf.googlegroups.com: Nothing else in the log from iscsid. No mention of a failed reconnect, although the only log I'm really able to access post failure is dmesg. Since I'm running a root iscsi, I couldn't get to /var/log/messages which maybe was a little more verbose? What sort of network problems might cause this? The network in this situation is a simple gigE Remember that syslogd can also write the log to a terminal or serial line. For SUSE Linux it's on tty10 (Ctrl+Alt+F10), but not very verbose. You could try to set it up similar with more verbosity. [...] Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Antw: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
On 07/14/2010 03:59 AM, Ulrich Windl wrote: Sean Ssstra...@gmail.com schrieb am 13.07.2010 um 20:41 in Nachricht 1f2389e7-9717-4f82-a05c-671f36a4c...@x21g2000yqa.googlegroups.com: I'm running an iscsi root partition for a CentOS machine running a 2.6.18-53 kernel. Every couple of days I get the error: connection1:0 detected conn error (1011) session1: session recovery timed out after 400 sec Hi! I cannot answer your question, but that brings up something I wanted to talk about. Please apologize if something already exists, but I don't know: In HP-UX 11.31 you can print scan times per device (i.e. LUN). Here's an example for a true FC-SAN: Class I H/W Path ms_scan_time === lunpath 3 0/3/1/0.0x50001fe1500c1f28.0x0 0 min 0 sec 13 ms lunpath 24 0/3/1/0.0x50001fe1500c1f28.0x4001 0 min 0 sec 88 ms lunpath 73 0/3/1/0.0x50001fe1500c1f28.0x4002 0 min 0 sec 88 ms lunpath 25 0/3/1/0.0x50001fe1500c1f28.0x4003 0 min 0 sec 88 ms lunpath 74 0/3/1/0.0x50001fe1500c1f28.0x4009 0 min 0 sec 88 ms lunpath 26 0/3/1/0.0x50001fe1500c1f28.0x4033 0 min 0 sec 88 ms lunpath 88 0/3/1/0.0x50001fe1500c1f28.0x4037 0 min 0 sec 88 ms lunpath 79 0/3/1/0.0x50001fe1500c1f28.0x403d 0 min 0 sec 91 ms lunpath 27 0/3/1/0.0x50001fe1500c1f28.0x4047 0 min 0 sec 91 ms [...] lunpath 63 0/7/1/0.0x500308c001d83803.0x4001 0 min 0 sec 11 ms lunpath 64 0/7/1/0.0x500308c001d83803.0x4002 0 min 0 sec 11 ms lunpath 65 0/7/1/0.0x500308c001d83803.0x4003 0 min 0 sec 11 ms lunpath 66 0/7/1/0.0x500308c001d83803.0x4004 0 min 0 sec 536 ms If Linux/open-iscsi had something similar, one could periodically watch the times to find bottlenecks. AFAIK, the scan time in HP-UX is the round-trip delay for querying a LUN or a controller (a target?). Did you want to find bottlenecks in the network or between the initiator and actual device or initiator and target? Erez, was adding some code where it exports the iscsi nop/ping times. The nop/ping we send has a header of 48 bytes and no data payload. It does not have do any disk/device IO. So this is nice for testing the network. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
I'm running an iscsi root partition for a CentOS machine running a 2.6.18-53 kernel. Every couple of days I get the error: connection1:0 detected conn error (1011) session1: session recovery timed out after 400 sec I compiled the open-iscsi 2.0-871 user tools and kernel modules from source obtained from open-iscsi.org. I custom packaged the initrd to contain the iscsistart binary and the kernel modules from v871. I've zeroed out the noop timeout setting and the noop interval. The disconnect is not reproducible, but does occur at random about every other day. I'm assuming that the target (IET 1.4.19) is not the issue as a second system that is using the target as an iscsi-root drive continues to work correctly. What things should I be looking at? I'm really struggling to understand why this happens, any suggestions would be greatly appreciated. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
On 07/13/2010 01:41 PM, Sean S wrote: I'm running an iscsi root partition for a CentOS machine running a 2.6.18-53 kernel. Every couple of days I get the error: connection1:0 detected conn error (1011) session1: session recovery timed out after 400 sec Is there anything more to the log? Is there anything from iscsid? Something about not being able to connect/reconnect to the target? If you just see that, then it means there was some connection problem. We do not know exactly what it was, but we disconnected the connection, then tried to reconnect. We tried to reconnect for 400 seconds but could not, so at that point we mark the session as bad and start to fail IO until we can log back in. It is normally due to a problem in the network if the target is ok. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53
Nothing else in the log from iscsid. No mention of a failed reconnect, although the only log I'm really able to access post failure is dmesg. Since I'm running a root iscsi, I couldn't get to /var/log/messages which maybe was a little more verbose? What sort of network problems might cause this? The network in this situation is a simple gigE switch with about 3 or 4 systems on it. The target and initiator are on the same subnet, nothing fancy. Is there some additional debug you'd recommend turning on? Any tips or tricks when running with a root iscsi drive? Curiously, if I physically disconnect the ethernet from the initiator while running, all I/O access is correctly paused without returning I/ O errors. If I then reconnect before the 400s is up things go back to normal. I don't however see the detected conn error (1011) message in this situation however. Not sure if that really means anything. Thanks for the help On Jul 13, 9:22 pm, Mike Christie micha...@cs.wisc.edu wrote: On 07/13/2010 01:41 PM, Sean S wrote: I'm running an iscsi root partition for a CentOS machine running a 2.6.18-53 kernel. Every couple of days I get the error: connection1:0 detected conn error (1011) session1: session recovery timed out after 400 sec Is there anything more to the log? Is there anything from iscsid? Something about not being able to connect/reconnect to the target? If you just see that, then it means there was some connection problem. We do not know exactly what it was, but we disconnected the connection, then tried to reconnect. We tried to reconnect for 400 seconds but could not, so at that point we mark the session as bad and start to fail IO until we can log back in. It is normally due to a problem in the network if the target is ok. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Hi Hannes, I am seeing similar problems. What kernel do you mean, that has fixes? On Jan 13, 7:24 am, Hannes Reinecke h...@suse.de wrote: avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. The iscsiadm settings are same on both SP2 and SP3. Is there anything else that can be tried ? # iscsiadm --mode node --targetname target ... # rpm -qa | grep iscsi iscsitarget-0.4.17-3.4.25 open-iscsi-2.0.868-0.6.11 yast2-iscsi-client-2.14.47-0.4.9 yast2-iscsi-server-2.13.26-0.3 Please try with the latest update kernel. I made quite some fixes which should help here. cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N rnberg GF: Markus Rex, HRB 16746 (AG N rnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. The iscsiadm settings are same on both SP2 and SP3. Is there anything else that can be tried ? # iscsiadm --mode node --targetname target ... # rpm -qa | grep iscsi iscsitarget-0.4.17-3.4.25 open-iscsi-2.0.868-0.6.11 yast2-iscsi-client-2.14.47-0.4.9 yast2-iscsi-server-2.13.26-0.3 Please try with the latest update kernel. I made quite some fixes which should help here. cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Just email the trace to me in private. Anuarg Vora wrote: I have got a reproducible test case for this. It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error (1011)' is seen. DID_BUS_BUSY when getting a 1011 is sort of expected. If you are not using dm-multipath then the scsi layer will retry the error value up to 5 times. If you are using dm-mutlipath then the scsi layer will fail the IO to the multipath layer, where it will retry a new path right away. for p in `ls /dev/sd*` do dd if=$p of=/dev/zero count=1 done wait # ./io-script 1+0 records in 1+0 records out 512 bytes (5.1 MB) copied, 0.177076 seconds, 28.9 MB/s dd: reading `/dev/sdaa8': Input/output error 2976+0 records in 2976+0 records out Dec 14 11:15:12 cdc-r710s3 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Dec 14 11:15:13 cdc-r710s3 kernel: connection2:0: detected conn error (1011) Dec 14 11:15:13 cdc-r710s3 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 14 11:15:13 cdc-r710s3 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 14 11:15:14 cdc-r710s3 kernel: connection1:0: detected conn error (1011) ... Dec 14 11:15:14 cdc-r710s3 kernel: sd 9:0:0:13: SCSI error: return code = 0x0002 == DID_BUS_BUSY Dec 14 11:15:14 cdc-r710s3 kernel: end_request: I/O error, dev sdaa, sector 2976 I am unable to upload ethereal on http://groups-beta.google.com/group/open-iscsi/files Regards, Anurag --- On Fri, 12/11/09, Anuarg Vora anurag_vo...@yahoo.com wrote: From: Anuarg Vora anurag_vo...@yahoo.com Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Friday, December 11, 2009, 12:22 AM Sorry, I do not see an upload option for me even after (signing-in). How to upload ? --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote: From: Mike Christie micha...@cs.wisc.edu Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Thursday, December 10, 2009, 11:39 PM Anuarg Vora wrote: I did sent the ethereal trace yesterday. I am not sure why it didn't reach, is there any place I can upload it ? http://groups-beta.google.com/group/open-iscsi/files -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I have got a reproducible test case for this. It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error (1011)' is seen. for p in `ls /dev/sd*` do dd if=$p of=/dev/zero count=1 done wait # ./io-script 1+0 records in 1+0 records out 512 bytes (5.1 MB) copied, 0.177076 seconds, 28.9 MB/s dd: reading `/dev/sdaa8': Input/output error 2976+0 records in 2976+0 records out Dec 14 11:15:12 cdc-r710s3 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Dec 14 11:15:13 cdc-r710s3 kernel: connection2:0: detected conn error (1011) Dec 14 11:15:13 cdc-r710s3 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 14 11:15:13 cdc-r710s3 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 14 11:15:14 cdc-r710s3 kernel: connection1:0: detected conn error (1011) ... Dec 14 11:15:14 cdc-r710s3 kernel: sd 9:0:0:13: SCSI error: return code = 0x0002 == DID_BUS_BUSY Dec 14 11:15:14 cdc-r710s3 kernel: end_request: I/O error, dev sdaa, sector 2976 I am unable to upload ethereal on http://groups-beta.google.com/group/open-iscsi/files Regards, Anurag --- On Fri, 12/11/09, Anuarg Vora anurag_vo...@yahoo.com wrote: From: Anuarg Vora anurag_vo...@yahoo.com Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Friday, December 11, 2009, 12:22 AM Sorry, I do not see an upload option for me even after (signing-in). How to upload ? --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote: From: Mike Christie micha...@cs.wisc.edu Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Thursday, December 10, 2009, 11:39 PM Anuarg Vora wrote: I did sent the ethereal trace yesterday. I am not sure why it didn't reach, is there any place I can upload it ? http://groups-beta.google.com/group/open-iscsi/files -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Sorry, I do not see an upload option for me even after (signing-in). How to upload ? --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote: From: Mike Christie micha...@cs.wisc.edu Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Thursday, December 10, 2009, 11:39 PM Anuarg Vora wrote: I did sent the ethereal trace yesterday. I am not sure why it didn't reach, is there any place I can upload it ? http://groups-beta.google.com/group/open-iscsi/files -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Is CHAP configured on the array? -Original Message- From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On Behalf Of Mike Christie Sent: Wednesday, December 09, 2009 9:54 PM To: open-iscsi@googlegroups.com Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) avora wrote: I do not see ping/nop timeout message in the logs (probably that's why changing the noop timeouts did not work). Simply starting the session does not cause these errors. On starting the second session, I start a daemon that does SCSI commands like INQUIRY on all the paths. After that I see these messages, and the daemon gets stuck for a very long time waiting for SCSI commands to finish. At the backend I have EMC CLARiiON. # iscsiadm -m node -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2 Portal: 192.168.10.1:3260,1 Iface Name: iface0 Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2 Portal: 192.168.12.1:3260,3 Iface Name: iface1 Does the same path always fail? If you log into one can you use it, then if you logout and log into the other does that other one then work? Is there any info the clarrion logs? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Yes Mike, the recovery message is seen right away. Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) 'conn error' and 'recovery' are seen one after the other, continuosly. On Dec 10, 8:04 am, Mike Christie micha...@cs.wisc.edu wrote: avora wrote: I got a similar issue while browsing http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37... I wanted to enable logging as mentioned in above link. echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_conn echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_session echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_eh echo 1 /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp echo 1 /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp --- But on my machine I only see. # ls /sys/module/libiscsi/ refcnt sections srcversion # ls /sys/module/iscsi_tcp/ parameters refcnt sections srcversion # ls /sys/module/iscsi_tcp/parameters/max_lun /sys/module/iscsi_tcp/parameters/max_lun Your open-iscsi version is older and does not have those settings. # iscsiadm -m session -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3 iSCSI Connection State: TRANSPORT WAIT iSCSI Session State: FAILED Internal iscsid Session State: REPOEN You might be seeing something else. I did not get what exactly you meant Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) After theconnerrormessage do you see one of these right away? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
There is no CHAP configured on the array. --- On Thu, 12/10/09, berthiaume_wa...@emc.com berthiaume_wa...@emc.com wrote: From: berthiaume_wa...@emc.com berthiaume_wa...@emc.com Subject: RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Thursday, December 10, 2009, 7:29 AM Is CHAP configured on the array? -Original Message- From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com] On Behalf Of Mike Christie Sent: Wednesday, December 09, 2009 9:54 PM To: open-iscsi@googlegroups.com Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) avora wrote: I do not see ping/nop timeout message in the logs (probably that's why changing the noop timeouts did not work). Simply starting the session does not cause these errors. On starting the second session, I start a daemon that does SCSI commands like INQUIRY on all the paths. After that I see these messages, and the daemon gets stuck for a very long time waiting for SCSI commands to finish. At the backend I have EMC CLARiiON. # iscsiadm -m node -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2 Portal: 192.168.10.1:3260,1 Iface Name: iface0 Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2 Portal: 192.168.12.1:3260,3 Iface Name: iface1 Does the same path always fail? If you log into one can you use it, then if you logout and log into the other does that other one then work? Is there any info the clarrion logs? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
avora wrote: Yes Mike, the recovery message is seen right away. Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) 'conn error' and 'recovery' are seen one after the other, continuosly. Do you have other initiators connected to the target? Can you get me a wireshark trace? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I did sent the ethereal trace yesterday. I am not sure why it didn't reach, is there any place I can upload it ? There is only 1 initiator. # iscsiadm -m session -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3 Current Portal: 192.168.11.1:3260,2 Persistent Portal: 192.168.11.1:3260,2 ** Interface: ** Iface Name: iface0 Iface Transport: tcp Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960 Iface IPaddress: 192.168.11.11 Iface HWaddress: 00:15:17:A8:A9:1E Iface Netdev: eth0 SID: 10 iSCSI Connection State: TRANSPORT WAIT iSCSI Session State: FAILED Internal iscsid Session State: REPOEN Target: iqn.1992-04.com.emc:cx.ckm00091100683.b3 Current Portal: 192.168.13.1:3260,4 Persistent Portal: 192.168.13.1:3260,4 ** Interface: ** Iface Name: iface1 Iface Transport: tcp Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960 Iface IPaddress: 192.168.13.11 Iface HWaddress: 00:15:17:A8:A9:1F Iface Netdev: eth1 SID: 11 iSCSI Connection State: TRANSPORT WAIT iSCSI Session State: FAILED Internal iscsid Session State: REPOEN --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote: From: Mike Christie micha...@cs.wisc.edu Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011) To: open-iscsi@googlegroups.com Date: Thursday, December 10, 2009, 11:22 PM avora wrote: Yes Mike, the recovery message is seen right away. Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) 'conn error' and 'recovery' are seen one after the other, continuosly. Do you have other initiators connected to the target? Can you get me a wireshark trace? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
Anuarg Vora wrote: I did sent the ethereal trace yesterday. I am not sure why it didn't reach, is there any place I can upload it ? http://groups-beta.google.com/group/open-iscsi/files -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I do not see ping/nop timeout message in the logs (probably that's why changing the noop timeouts did not work). Simply starting the session does not cause these errors. On starting the second session, I start a daemon that does SCSI commands like INQUIRY on all the paths. After that I see these messages, and the daemon gets stuck for a very long time waiting for SCSI commands to finish. At the backend I have EMC CLARiiON. # iscsiadm -m node -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2 Portal: 192.168.10.1:3260,1 Iface Name: iface0 Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2 Portal: 192.168.12.1:3260,3 Iface Name: iface1 # iscsiadm --mode node --targetname iqn. 1992-04.com.emc:cx.ckm00091100683.a2 node.name = iqn.1992-04.com.emc:cx.ckm00091100683.a2 node.tpgt = 1 node.startup = automatic iface.hwaddress = 00:15:17:A8:A9:0A iface.iscsi_ifacename = iface0 iface.net_ifacename = eth4 iface.transport_name = tcp node.discovery_address = 192.168.10.1 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 20 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 192.168.10.1 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None,CRC32C node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote: avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0error(1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. Did you see a ping/nop timeout message in the logs or just what you included above with theconnerror1011? The ping/nop message would be a little before the conerror1011. What target is this with and are you doing any IO tests when this happens or are you just logging into the second session and then you start to get these errors? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I got a similar issue while browsing http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f I wanted to enable logging as mentioned in above link. echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_conn echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_session echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_eh echo 1 /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp echo 1 /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp --- But on my machine I only see. # ls /sys/module/libiscsi/ refcnt sections srcversion # ls /sys/module/iscsi_tcp/ parameters refcnt sections srcversion # ls /sys/module/iscsi_tcp/parameters/max_lun /sys/module/iscsi_tcp/parameters/max_lun # iscsiadm -m session -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3 iSCSI Connection State: TRANSPORT WAIT iSCSI Session State: FAILED Internal iscsid Session State: REPOEN On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote: avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0error(1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. Did you see a ping/nop timeout message in the logs or just what you included above with theconnerror1011? The ping/nop message would be a little before the conerror1011. What target is this with and are you doing any IO tests when this happens or are you just logging into the second session and then you start to get these errors? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. The iscsiadm settings are same on both SP2 and SP3. Is there anything else that can be tried ? # iscsiadm --mode node --targetname target ... # rpm -qa | grep iscsi iscsitarget-0.4.17-3.4.25 open-iscsi-2.0.868-0.6.11 yast2-iscsi-client-2.14.47-0.4.9 yast2-iscsi-server-2.13.26-0.3 -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0: detected conn error (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0: detected conn error (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. Did you see a ping/nop timeout message in the logs or just what you included above with the conn error 1011? The ping/nop message would be a little before the con error 1011. What target is this with and are you doing any IO tests when this happens or are you just logging into the second session and then you start to get these errors? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
On 11/10/09 11:39 AM, Mike Christie micha...@cs.wisc.edu wrote: What version of open-iscsi were you using and what kernel, and were you using the iscsi kernel modules with open-iscsi.org tarball or from the kernel? iscsi-initiator-utils-6.2.0.871-0.10.el5 kernel-2.6.18-164.2.1.el5 RedHat RPMs It looks like we are sending more IO than the target can handle. In one of those cases it took more than 30 or 60 seconds (depending on your timeout value). What is the value of cat /sys/block/sdXYZ/device/timeout ? If it is 30 or 60 could you increase it to 360? After you login to the target do echo 360 /sys/block/sdXYZ/device/timeout I've tried setting this, but it appears to have no effect - it was 60, and I increased to 360. And what is the value of: iscsiadm -m node -T your_target | grep node.session.cmds_max If that is 128, then could you decrease that to 32 or 16? Run iscsiadm -m node -T your_target -u iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32 iscsiad-m node -T your_target -l I've tried setting to both 16 and 32, but it behaves about the same. And if those prevent the io errors then could you do echo noop /sys/block/sdXYZ/queue/scheduler to see if performance increases with a difference scheduler. I really think I'm back to the duplicate ACK problem - see the attached packet dump - at one point there's 30 duplicate ACKs... Interestingly, the storage has worked for the past week - I'm using it as D2D backup. This morning (about 7 days later), it's giving all these duplicate ACKs. I'm currently running into messages such as: Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Nov 19 09:47:00 backup kernel: session2: target reset succeeded Nov 19 09:47:01 backup iscsid: connection2:0 is operational after recovery (1 attempts) Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code = 0x000e Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 8856 Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:80. Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code = 0x000e Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 74424 Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code = 0x000e Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector 8845240 Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:192. Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code = 0x000e Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector 62915456 Nov 19 09:47:10 backup kernel: sd 9:0:0:2: timing out command, waited 300s Nov 19 09:47:10 backup multipathd: /sbin/mpath_prio_alua exitted with 1 Nov 19 09:47:10 backup multipathd: error calling out /sbin/mpath_prio_alua /dev/sdm Nov 19 09:47:10 backup multipathd: 3600d0230061d4479bfb83902: switch to path group #2 This is also interesting: Nov 18 01:48:30 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 8 Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 7 Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 6 Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 5 Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 4 Nov 18 20:16:34 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 5 Nov 18 20:32:09 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 6 Nov 18 20:43:05 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 7 Nov 18 20:48:08 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 8 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 7 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 6 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 5 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 4 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 3 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 2 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 1 Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 0 Nov 18 20:53:41 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 1 Nov 18 20:59:09 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 2 Nov 18 21:04:37 backup multipathd: 3600d0230061d4479bfb83902: remaining active paths: 3 Nov 18 21:10:05 backup multipathd:
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Matthew Dickinson wrote: On 11/10/09 11:39 AM, Mike Christie micha...@cs.wisc.edu wrote: What version of open-iscsi were you using and what kernel, and were you using the iscsi kernel modules with open-iscsi.org tarball or from the kernel? iscsi-initiator-utils-6.2.0.871-0.10.el5 kernel-2.6.18-164.2.1.el5 RedHat RPMs It looks like we are sending more IO than the target can handle. In one of those cases it took more than 30 or 60 seconds (depending on your timeout value). What is the value of cat /sys/block/sdXYZ/device/timeout ? If it is 30 or 60 could you increase it to 360? After you login to the target do echo 360 /sys/block/sdXYZ/device/timeout I've tried setting this, but it appears to have no effect - it was 60, and I increased to 360. And what is the value of: iscsiadm -m node -T your_target | grep node.session.cmds_max If that is 128, then could you decrease that to 32 or 16? Run iscsiadm -m node -T your_target -u iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32 iscsiad-m node -T your_target -l I've tried setting to both 16 and 32, but it behaves about the same. And if those prevent the io errors then could you do echo noop /sys/block/sdXYZ/queue/scheduler to see if performance increases with a difference scheduler. I really think I'm back to the duplicate ACK problem - see the attached packet dump - at one point there's 30 duplicate ACKs... Interestingly, the I did not get the attachement. storage has worked for the past week - I'm using it as D2D backup. This morning (about 7 days later), it's giving all these duplicate ACKs. I'm currently running into messages such as: Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3) Nov 19 09:47:00 backup kernel: session2: target reset succeeded If you are using Red Hat RPMs, make a red hat bugzilla https://bugzilla.redhat.com/. CC mchri...@redhat.com on the bugzilla or email me at that address when you have made the bugzilla. I will then add some network people to it. Attach your trace to the bugzilla. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
sorry... wrong information. Here is the correct information. I was doing some testing in VMWare Fusion VM's for a presentation that I'm giving. The storage server is CentOS 5.3, which dishes out IETD targets for my OVM servers. The OVM 2.2 environment is as follows: [r...@ovm1 ~]# uname -r 2.6.18-128.2.1.4.9.el5xen [r...@ovm1 ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.871-0.7.el5 [r...@ovm1 ~]# On Nov 10, 2009, at 2:30 PM, Mike Christie wrote: Hoot, Joseph wrote: [r...@storage ~]# uname -r 2.6.18-164.el5 [r...@storage ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1 [r...@storage ~]# Weird. Is 2.6.18-164.el5 the kernel being used in the virtual machine/DonU? Is that where you are using iscsi? It looks like the Oracle enterprise linux kernel is 2.6.18-164.el5, which looks like it is based on RHEL 5.4. The iscsi code in there is the same as RHEL/upstream. No sendwait patch. However, it looks like there is a 2.6.18-128.2.1.4.9 kernel (comes with the Oracle VM rpms). In here we have a different iscsi version. It looks a little older than what is in 2.6.18-164.el5, but it has the sendwait patch I send to dell. Do you use this kernel in the Dom0? Are you using this kernel with iscsi? On Nov 10, 2009, at 12:17 PM, Mike Christie wrote: Hoot, Joseph wrote: I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate volumes for over a week and haven't had a single disconnect yet. I am currently using whatever rpm is distributed with Oracle VM v2.2. I know for sure that they have included the 871 base, plus I believe at least a one off patch. I can get more details if you'd like. But so far so good for now I think I have the source they are using. Could you do a uname -r, so I can see what kernel they are using. === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hoot, Joseph wrote: I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate volumes for over a week and haven't had a single disconnect yet. I am currently using whatever rpm is distributed with Oracle VM v2.2. I know for sure that they have included the 871 base, plus I believe at least a one off patch. I can get more details if you'd like. But so far so good for now I think I have the source they are using. Could you do a uname -r, so I can see what kernel they are using. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
[r...@storage ~]# uname -r 2.6.18-164.el5 [r...@storage ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1 [r...@storage ~]# On Nov 10, 2009, at 12:17 PM, Mike Christie wrote: Hoot, Joseph wrote: I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate volumes for over a week and haven't had a single disconnect yet. I am currently using whatever rpm is distributed with Oracle VM v2.2. I know for sure that they have included the 871 base, plus I believe at least a one off patch. I can get more details if you'd like. But so far so good for now I think I have the source they are using. Could you do a uname -r, so I can see what kernel they are using. === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Matthew Dickinson wrote: On 11/6/09 3:39 PM, Matthew Dickinson matt-openis...@alpha345.com wrote: Try disabling nops by setting node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 I'm still getting errors: Nov 10 09:08:04 backup kernel: connection12:0: detected conn error (1011) Nov 10 09:08:05 backup iscsid: Kernel reported iSCSI connection 12:0 error (1011) state (3) Nov 10 09:08:08 backup iscsid: connection12:0 is operational after recovery (1 attempts) Nov 10 09:09:43 backup kernel: connection11:0: detected conn error (1011) Nov 10 09:09:43 backup kernel: connection12:0: detected conn error (1011) Nov 10 09:09:44 backup kernel: connection11:0: detected conn error (1011) Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 11:0 error (1011) state (3) Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 12:0 error (1011) state (3) Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 11:0 error (1011) state (1) Nov 10 09:09:46 backup kernel: session11: target reset succeeded\ Nov 10 09:09:47 backup iscsid: connection11:0 is operational after recovery (1 attempts) Nov 10 09:09:47 backup iscsid: connection12:0 is operational after recovery (1 attempts) Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code = 0x000e Nov 10 09:09:56 backup kernel: end_request: I/O error, dev sdv, sector 60721248 Nov 10 09:09:56 backup kernel: device-mapper: multipath: Failing path 65:80. Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code = 0x000e Nov 10 09:09:56 backup kernel: end_request: I/O error, dev sdv, sector 60727648 Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code = 0x000e Nov 10 09:10:31 backup kernel: device-mapper: multipath: Failing path 65:112. Interestingly, I tried a Windows 2008 server R2 talking over a single connection to the storage unit, configured to access just via one interface, I was able to sustain 20MB/s so it would ³appear² to be a Linux-related issue - I'm only able to get 9MB/s out of Linux even when using 8 interfaces on both controllers. What version of open-iscsi were you using and what kernel, and were you using the iscsi kernel modules with open-iscsi.org tarball or from the kernel? It looks like we are sending more IO than the target can handle. In one of those cases it took more than 30 or 60 seconds (depending on your timeout value). What is the value of cat /sys/block/sdXYZ/device/timeout ? If it is 30 or 60 could you increase it to 360? After you login to the target do echo 360 /sys/block/sdXYZ/device/timeout And what is the value of: iscsiadm -m node -T your_target | grep node.session.cmds_max If that is 128, then could you decrease that to 32 or 16? Run iscsiadm -m node -T your_target -u iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32 iscsiad-m node -T your_target -l And if those prevent the io errors then could you do echo noop /sys/block/sdXYZ/queue/scheduler to see if performance increases with a difference scheduler. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hoot, Joseph wrote: [r...@storage ~]# uname -r 2.6.18-164.el5 [r...@storage ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1 [r...@storage ~]# Weird. Is 2.6.18-164.el5 the kernel being used in the virtual machine/DonU? Is that where you are using iscsi? It looks like the Oracle enterprise linux kernel is 2.6.18-164.el5, which looks like it is based on RHEL 5.4. The iscsi code in there is the same as RHEL/upstream. No sendwait patch. However, it looks like there is a 2.6.18-128.2.1.4.9 kernel (comes with the Oracle VM rpms). In here we have a different iscsi version. It looks a little older than what is in 2.6.18-164.el5, but it has the sendwait patch I send to dell. Do you use this kernel in the Dom0? Are you using this kernel with iscsi? On Nov 10, 2009, at 12:17 PM, Mike Christie wrote: Hoot, Joseph wrote: I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate volumes for over a week and haven't had a single disconnect yet. I am currently using whatever rpm is distributed with Oracle VM v2.2. I know for sure that they have included the 871 base, plus I believe at least a one off patch. I can get more details if you'd like. But so far so good for now I think I have the source they are using. Could you do a uname -r, so I can see what kernel they are using. === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
El 06/11/09 14:10, mdaitc escribió: Hi mdaitc, I’m seeing similar TCP “weirdness” as the other posts mention as well as the below errors. (..) Nov 2 08:15:14 backup kernel: connection33:0: detected conn error The performance isn’t what I’d expect: (..) What happens if you disable TCP window scaling option in RHEL servers? # echo 0 /proc/sys/net/ipv4/tcp_window_scaling In our case, iSCSI conn errors stopped after disabling, but still have a lot of TCP “weirdness” in the network, mainly dup ACKs packages. Regards, -- Santi Saez http://woop.es --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
What version of OiS are you using? I had lots of weirdness and the same types of disconnects to our Dell EqualLogic when we were (actually still are in production) using 868 code. I'm now using open- iscsi-871 code plus a sendwait patch and haven' had the issue. I've now been slamming my storage for a week and a half with multiple threads of dt. On Nov 9, 2009, at 4:33 AM, Santi Saez wrote: El 06/11/09 14:10, mdaitc escribió: Hi mdaitc, I’m seeing similar TCP “weirdness” as the other posts mention as well as the below errors. (..) Nov 2 08:15:14 backup kernel: connection33:0: detected conn error The performance isn’t what I’d expect: (..) What happens if you disable TCP window scaling option in RHEL servers? # echo 0 /proc/sys/net/ipv4/tcp_window_scaling In our case, iSCSI conn errors stopped after disabling, but still have a lot of TCP “weirdness” in the network, mainly dup ACKs packages. Regards, -- Santi Saez http://woop.es === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hi all, I am working on iSCSI En. Tar. Could you please someone explain about the performance of the IET. If so how the performance was calculated and what was the througput for the same. Thanks Gopala krishnan Varatharajan On Sat, Nov 7, 2009 at 3:09 AM, Matthew Dickinson matt-openis...@alpha345.com wrote: On 11/6/09 3:08 PM, Mike Christie micha...@cs.wisc.edu wrote: Could you send more of the log? Do you see a message like connection1:0 is operational after recovery (1 attempts) after you see the conn errors (how many attempts)? Here's one particular connection: Nov 4 05:12:14 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321648393, last ping 4321653393, now 4321658393 Nov 4 05:12:14 backup kernel: connection22:0: detected conn error (1011) Nov 4 05:12:21 backup iscsid: connection22:0 is operational after recovery (1 attempts) Nov 4 05:12:46 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321680691, last ping 4321685691, now 4321690691 Nov 4 05:12:46 backup kernel: connection22:0: detected conn error (1011) Nov 4 05:12:58 backup iscsid: connection22:0 is operational after recovery (1 attempts) Nov 4 07:46:03 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4330877890, last ping 4330882890, now 4330887890 Nov 4 07:46:03 backup kernel: connection22:0: detected conn error (1011) Nov 4 07:46:10 backup iscsid: connection22:0 is operational after recovery (1 attempts) Nov 4 07:46:27 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4330901733, last ping 4330906733, now 4330911733 Nov 4 07:46:27 backup kernel: connection22:0: detected conn error (1011) Nov 4 07:46:32 backup iscsid: connection22:0 is operational after recovery (1 attempts) Nov 4 07:47:21 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4330955414, last ping 4330960414, now 4330965414 Nov 4 07:47:21 backup kernel: connection22:0: detected conn error (1011) Nov 4 07:47:28 backup iscsid: connection22:0 is operational after recovery (1 attempts) Nov 4 07:48:28 backup kernel: connection22:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4331023213, last ping 4331028213, now 4331033213 Nov 4 07:48:28 backup kernel: connection22:0: detected conn error (1011) Nov 4 07:48:35 backup iscsid: connection22:0 is operational after recovery (1 attempts) FWIW: [r...@backup ~]# cat /var/log/messages | grep after recovery | awk '{print $11 $12}' | sort | uniq (113 attempts) (1 attempts) (24 attempts) (2 attempts) (3 attempts) (4 attempts) (5 attempts) (66 attempts) (68 attempts) (6 attempts) (7 attempts) (8 attempts) (9 attempts) Try disabling nops by setting node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 Ok, I'll let you know how it pans out. Thanks, Matthew -- --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hoot, Joseph wrote: What version of OiS are you using? I had lots of weirdness and the same types of disconnects to our Dell EqualLogic when we were (actually still are in production) using 868 code. I'm now using open- iscsi-871 code plus a sendwait patch and haven' had the issue. I've What is the sendwait patch? Is it a patch for open-iscsi or to the kernel network code? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
it was for OiS 871 code prior to RHEL 5.4 release (not sure if the release include it or not). I'm not sure who came up with it. I was working with Don Williams from Dell EqualLogic. He got ahold of it somehow. I applied it and it seemed to improve things. On Nov 9, 2009, at 2:31 PM, Mike Christie wrote: Hoot, Joseph wrote: What version of OiS are you using? I had lots of weirdness and the same types of disconnects to our Dell EqualLogic when we were (actually still are in production) using 868 code. I'm now using open- iscsi-871 code plus a sendwait patch and haven' had the issue. I've What is the sendwait patch? Is it a patch for open-iscsi or to the kernel network code? === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hoot, Joseph wrote: it was for OiS 871 code prior to RHEL 5.4 release (not sure if the release include it or not). I'm not sure who came up with it. I was working with Don Williams from Dell EqualLogic. He got ahold of it somehow. I applied it and it seemed to improve things. Ah ok. I think it was the patch I sent to Don. If you just used 871 without the patch (or what is in the stock RHEL 5.4 kernel) does it work ok? There were a couple changes from 868 to 871 that I thought would also fix the problem, so I was waiting for Don and them to retest just 871 and get back to me. On Nov 9, 2009, at 2:31 PM, Mike Christie wrote: Hoot, Joseph wrote: What version of OiS are you using? I had lots of weirdness and the same types of disconnects to our Dell EqualLogic when we were (actually still are in production) using 868 code. I'm now using open- iscsi-871 code plus a sendwait patch and haven' had the issue. I've What is the sendwait patch? Is it a patch for open-iscsi or to the kernel network code? === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate volumes for over a week and haven't had a single disconnect yet. I am currently using whatever rpm is distributed with Oracle VM v2.2. I know for sure that they have included the 871 base, plus I believe at least a one off patch. I can get more details if you'd like. But so far so good for now On Nov 9, 2009, at 6:18 PM, Mike Christie wrote: Hoot, Joseph wrote: it was for OiS 871 code prior to RHEL 5.4 release (not sure if the release include it or not). I'm not sure who came up with it. I was working with Don Williams from Dell EqualLogic. He got ahold of it somehow. I applied it and it seemed to improve things. Ah ok. I think it was the patch I sent to Don. If you just used 871 without the patch (or what is in the stock RHEL 5.4 kernel) does it work ok? There were a couple changes from 868 to 871 that I thought would also fix the problem, so I was waiting for Don and them to retest just 871 and get back to me. On Nov 9, 2009, at 2:31 PM, Mike Christie wrote: Hoot, Joseph wrote: What version of OiS are you using? I had lots of weirdness and the same types of disconnects to our Dell EqualLogic when we were (actually still are in production) using 868 code. I'm now using open- iscsi-871 code plus a sendwait patch and haven' had the issue. I've What is the sendwait patch? Is it a patch for open-iscsi or to the kernel network code? === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
El 03/11/09 0:52, Mike Christie escribió: Dear Mike, You can turn off ping/nops by setting node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 (set that in iscsid.conf then rediscovery the target or run iscsiadm -m node -T your_target -o update -n name_of_param_above -v 0 Thanks!! As I said to James in the previous email, disabling TCP window scaling *solves partially* this problem, we still hold nop pings in the configuration. But still have too many TCP Dup ACKs in the network :-S This might just work around. What might happen is that you will not see the nop/ping and conn errors and instead would just see a slow down in the workloads being run. I have sent your contact to Infortrend developers, a engineer will contact you, thanks! Regards, -- Santi Saez http://woop.es --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Hi, Randomly we get Open-iSCSI conn errors when connecting to an Infortrend A16E-G2130-4 storage array. We had discussed about this earlier in the list, see: http://tr.im/DVQm http://tr.im/DVQp Open-iSCSI logs this: === Nov 2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx 408250499, last ping 408249467, now 408254467 Nov 2 18:34:02 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:07 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) Nov 2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx 408294833, last ping 408299833, now 408304833 Nov 2 18:34:52 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:57 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) === Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5; I think it's not a Open-iSCSI bug as Mike suggested at: http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f I have only this error when connecting to Infortrend storage, and not with NetApp, Nexsan, etc. *connected in the same SAN*. Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost segment, etc. and iSCSI session finally ends in timeout, see a screenshot here: http://tinyurl.com/ykpvckn Using Wireshark IO graphs I get this strange report about TCP/IP errors: http://tinyurl.com/ybm4m8x And this is another report in the same SAN connecting to a NetApp: http://tinyurl.com/ycgc8ul Those TCP/IP errors only occurs when connecting to Infortrend storage.. and no with other targets in the same SAN (using same switch infrastructure); is there anyway to deal with this using Open-iSCSI? As I see in Internet, there're a lot of Infortrend's users suffering this behavior. Thanks! P.D: speed and duplex configuration is correct in all point, there aren't CRC errors in the switch. -- Santi Saez http://woop.es --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
Santi Saez wrote: Hi, Randomly we get Open-iSCSI conn errors when connecting to an Infortrend A16E-G2130-4 storage array. We had discussed about this earlier in the list, see: http://tr.im/DVQm http://tr.im/DVQp Open-iSCSI logs this: === Nov 2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx 408250499, last ping 408249467, now 408254467 Nov 2 18:34:02 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:07 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) Nov 2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx 408294833, last ping 408299833, now 408304833 Nov 2 18:34:52 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:57 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) === Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5; I think it's not a Open-iSCSI bug as Mike suggested at: http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f I have only this error when connecting to Infortrend storage, and not with NetApp, Nexsan, etc. *connected in the same SAN*. Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost segment, etc. and iSCSI session finally ends in timeout, see a screenshot here: http://tinyurl.com/ykpvckn Using Wireshark IO graphs I get this strange report about TCP/IP errors: http://tinyurl.com/ybm4m8x And this is another report in the same SAN connecting to a NetApp: http://tinyurl.com/ycgc8ul Those TCP/IP errors only occurs when connecting to Infortrend storage.. and no with other targets in the same SAN (using same switch infrastructure); is there anyway to deal with this using Open-iSCSI? As I see in Internet, there're a lot of Infortrend's users suffering this behavior. You can turn off ping/nops by setting node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 (set that in iscsid.conf then rediscovery the target or run iscsiadm -m node -T your_target -o update -n name_of_param_above -v 0 Or you might want to set them higher. This might just work around. What might happen is that you will not see the nop/ping and conn errors and instead would just see a slow down in the workloads being run. If you guys can get a hold of any infrotrend people let me know, because I would be happy to work with them on this. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK
MAC Rcvd Multicast Frames178236 MAC Rcvd Broadcast Frames34 iSCSI Shared Statistics --- PDUs Xmited 324033312 Data Bytes Xmited29836097572 PDUs Rcvd198783508 Data Bytes Rcvd 35739418624 I/O Completed165710975 Unexpected I/O Rcvd 0 iSCSI Format Errors 0 Header Digest Errors 0 Data Digest Errors 0 Sequence Errors 0 IP Xmit Packets 242949995 IP Xmit Byte Count 47161789220 IP Xmit Fragments0 IP Rcvd Packets 312354406 IP Rcvd Byte Count 371426357904 IP Rcvd Fragments0 IP Datagram Reassembly Count 0 IP Error Packets 0 IP Fragment Rcvd Overlap 0 IP Fragment Rcvd Out of Order0 IP Datagram Reassembly Timeouts 0 TCP Xmit Segment Count 242949995 TCP Xmit Byte Count 38654705673 TCP Rcvd Segment Count 312354406 TCP Rcvd Byte Count 361430272728 TCP Persist Timer Expirations0 TCP Rxmit Timer Expired 0 TCP Rcvd Duplicate Acks 644 TCP Rcvd Pure Acks 4091830 TCP Xmit Delayed Acks13648891 TCP Xmit Pure Acks 31445514 TCP Rcvd Segment Errors 101 TCP Rcvd Segment Out of Order306 TCP Rcvd Window Probes 0 TCP Rcvd Window Updates 0 TCP ECC Error Corections 0 Regards, Ulrich On 2 Nov 2009 at 19:16, Santi Saez wrote: Hi, Randomly we get Open-iSCSI conn errors when connecting to an Infortrend A16E-G2130-4 storage array. We had discussed about this earlier in the list, see: http://tr.im/DVQm http://tr.im/DVQp Open-iSCSI logs this: === Nov 2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx 408250499, last ping 408249467, now 408254467 Nov 2 18:34:02 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:07 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) Nov 2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx 408294833, last ping 408299833, now 408304833 Nov 2 18:34:52 vz-17 kernel: connection1:0: iscsi: detected conn error (1011) Nov 2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3) Nov 2 18:34:57 vz-17 iscsid: connection1:0 is operational after recovery (1 attempts) === Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5; I think it's not a Open-iSCSI bug as Mike suggested at: http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f I have only this error when connecting to Infortrend storage, and not with NetApp, Nexsan, etc. *connected in the same SAN*. Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost segment, etc. and iSCSI session finally ends in timeout, see a screenshot here: http://tinyurl.com/ykpvckn Using Wireshark IO graphs I get this strange report about TCP/IP errors: http://tinyurl.com/ybm4m8x And this is another report in the same SAN connecting to a NetApp: http://tinyurl.com/ycgc8ul Those TCP/IP errors only occurs when connecting to Infortrend storage.. and no with other targets in the same SAN (using same switch infrastructure); is there anyway to deal with this using Open-iSCSI? As I see in Internet, there're a lot of Infortrend's users suffering this behavior. Thanks! P.D: speed and duplex configuration is correct in all point, there aren't CRC errors in the switch. -- Santi Saez http://woop.es --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Help debugging connection1:0: detected conn error (1011)
Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
I downloaded and compiled the open-iscsi release, so I think I am using that. However, just to be sure I uninstalled iscsi-initiator-utils (yum remove ) so I only have open-scsi. Now, when I try to start iscsid ( ./iscsid -f ) I get the following: iscsid: Missing or Invalid version from /sys/module/ scsi_transport_iscsi/version. Make sure a up to date scsi_transport_iscsi module is loaded and a up todate version of iscsid is running. Exiting... I will investigate that. Any hints? On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
All sorted. Definitely running only open-iscsi now. Still broken though Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi6 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded connection1:0: detected conn error (1011) session1: host reset succeeded scsi 6:0:0:0: Device offlined - not ready after error recovery On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
hissing_sid wrote: I downloaded and compiled the open-iscsi release, so I think I am using that. However, just to be sure I uninstalled iscsi-initiator-utils (yum remove ) so I only have open-scsi. Now, when I try to start iscsid ( ./iscsid -f ) I get the following: iscsid: Missing or Invalid version from /sys/module/ scsi_transport_iscsi/version. Make sure a up to date scsi_transport_iscsi module is loaded and a up todate version of iscsid is running. Exiting... You need to load the iscsi modules. modprobe iscsi_tcp You probably were using the iscsi-initiator-utils tools. With them you do have done: service iscsi start with open-iscsi.org tools you do service open-iscsi start if you were using the init scripts. The init scripts do the modprbe and iscsid, so if you start iscsid by hand you have to do the modprobe too. You might have a weird mix. Do a whereis iscsid and whereis iscsiadm and whereis iscsistart remove them. Then just do yum install iscsi-initiator-utils. That should give you the current F9 tools, which should work. I will investigate that. Any hints? On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
hissing_sid wrote: All sorted. Definitely running only open-iscsi now. Ok ignore the request to use iscsi-initiator-utils. Still broken though Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi6 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded connection1:0: detected conn error (1011) session1: host reset succeeded scsi 6:0:0:0: Device offlined - not ready after error recovery Can you get a ethereal/wireshark trace? On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Help debugging connection1:0: detected conn error (1011)
Just looked at the trace myself. I can see it trying to contact LUN 00 but I do not have that configured on my target. I am using LUN 01 That is why is it failing. When I change my Target LUN to 00 it works. So, how or where is the initiator deciding on the LUN? Is it configured or should it find it from the Target? Thanks for your help so far, I feel I am getting somewhere now! On May 8, 6:27 pm, hissing_sid dopey...@gmail.com wrote: I have a dump. How should I get it to you? On May 8, 5:52 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: All sorted. Definitely running only open-iscsi now. Ok ignore the request to use iscsi-initiator-utils. Still broken though Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi6 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded connection1:0: detected conn error (1011) session1: host reset succeeded scsi 6:0:0:0: Device offlined - not ready after error recovery Can you get a ethereal/wireshark trace? On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group
Re: Help debugging connection1:0: detected conn error (1011)
hissing_sid wrote: Just looked at the trace myself. I can see it trying to contact LUN 00 but I do not have that configured on my target. I am using LUN 01 That is why is it failing. When I change my Target LUN to 00 it works. So, how or where is the initiator deciding on the LUN? Is it The scsi layer will send a inquiry to lun0 to get some info about the target and start off the device discovery process. It would then normally send a report luns command to discover all the devices. Your target might be operating as expected and need some special flags in the scsi layer. Attach your trace to here http://groups-beta.google.com/group/open-iscsi/files so we can see what is going on. configured or should it find it from the Target? Thanks for your help so far, I feel I am getting somewhere now! On May 8, 6:27 pm, hissing_sid dopey...@gmail.com wrote: I have a dump. How should I get it to you? On May 8, 5:52 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: All sorted. Definitely running only open-iscsi now. Ok ignore the request to use iscsi-initiator-utils. Still broken though Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi6 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded connection1:0: detected conn error (1011) session1: host reset succeeded scsi 6:0:0:0: Device offlined - not ready after error recovery Can you get a ethereal/wireshark trace? On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote: hissing_sid wrote: Hi, I am running an initiator on FC9 64-bit (Linux 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy over 1GB ethernet with iSCSI. I am running open-iscsi 2.0-870. Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one? When I try to connect I get a conn error (1011) and I am struggling to know where to go next. The NexSan has no errors. Can someone give me pointer on where to go next to try and get this working. I have another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and that works just fine. Any help please? #iscsiadm -m session tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy: 028a2347 Loading iSCSI transport class v2.0-870. iscsi: registered transport (tcp) iscsi: registered transport (iser) scsi3 : iSCSI Initiator over TCP/IP connection1:0: detected conn error (1011) session1: host reset succeeded Looks like maybe the initial inquiry or repport luns that the scsi layer sends is timing out. The iscsi layer probably tries to abort the command and that fails so we try to drop the session (conn error 1011) then re-login. It looks like we log in at the iscsi level ok. I am not sure why this would happen. Let me do some digging. I think this might have come up before, but I did not see it. connection1:0: detected conn error (1011) session1: host reset succeeded scsi 3:0:0:0: Device offlined - not ready after error recovery #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p 10.52.145.121 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347 node.tpgt = 2 node.startup = automatic iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp iface.initiatorname = empty node.discovery_address = 10.52.145.121 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 10.52.145.121 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 10 node.conn[0].timeo.noop_out_timeout = 15 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker
Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.
Konrad Rzeszutek schrieb: On Thu, Apr 17, 2008 at 06:07:12AM -0400, Konrad Rzeszutek wrote: It looks like the network is off but the session is still running. We eventually get to the kernel shutoff here. Is your init script getting run? If not then run it. If you left the session on on purpose then you cannot turn the network off because the scsi layer will want to do its shutdown when the kernel is stopped. Ah. Thanks for the explanation. The init script was run, but it didn't logoff of all the sessions (it would selectivly logoff instead of doing all of them). After I made sure that 'iscsiadm -m session -U all' was called during shutdown a QA engineer here was able to make the 'iscsiadm' hang during this sequence. The result was that some of the iSCSI sessions did log-out while some other did not, and the machine hanged during the Synchronizing SCSI cache for disk .. I didn't follow the thread very closely, but a hang during Synchronizing SCSI cache for disk happens because: - iSCSI sessions were not properly disconnected, and - they can't be properly disconnected any more, because the network is already disabled. Most distributions shut down all network interfaces when a halt command is started (i.e., they add -i option to the halt command): -i: shut down all network interfaces. Without this flag, everything should shut down properly, even when it's not possible to logout all sessions earlier (i.e., a diskless machine started off iSCSI). -- Tomasz Chmielewski http://wpkg.org --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.
Synchronizing SCSI cache for disk happens because: - iSCSI sessions were not properly disconnected, and Correct. - they can't be properly disconnected any more, because the network is already disabled. Kind of. There is a kernel timer that gets activated during the logout sequence that waits for up to 120 seconds (or what you have set in node.session.timeo.replacement_timeout) and if the logout sequence hasn't completed releases the kernel resources. Most distributions shut down all network interfaces when a halt command is started (i.e., they add -i option to the halt command): -i: shut down all network interfaces. Without this flag, everything should shut down properly, even when it's Right. And this situation will hang the kernel during reboot b/c the SCSI error handlers wait for a logout state condition that never happens. not possible to logout all sessions earlier (i.e., a diskless machine started off iSCSI). And the patch I attached in the previous e-mail describes a solution to this. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: detected conn error (1011) on login with open-iscsi-2.0-869
On 23 Apr, 20:29, Mike Christie [EMAIL PROTECTED] wrote: DaMn wrote: Hi, despite the Subject, i've not found a solution to my problem in others threads. I'm having trouble connecting a target with two LUN: # iscsiadm -m discovery -t st -p 192.168.29.13 puts out: 192.168.29.13:3260,1 iqn.1991-05.com.microsoft:atlante-vm-shared-disk- vm2-target 192.168.29.13:3260,1 iqn.1991-05.com.microsoft:atlante-vm-shared-disk- vm1-target # iscsiadm -m node -T iqn.1991-05.com.microsoft:atlante-vm-shared- disk-vm1-target print out: node.name = iqn.1991-05.com.microsoft:atlante-vm-shared-disk-vm1- target node.tpgt = 1 node.startup = manual iface.hwaddress = default iface.iscsi_ifacename = default iface.net_ifacename = default iface.transport_name = tcp node.discovery_address = 192.168.29.13 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 20 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 192.168.29.13 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None,CRC32C node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No then, when i try to connect: # iscsiadm -m node -T iqn.1991-05.com.microsoft:atlante-vm-shared- disk-vm1-target -l prints out: Logging in to [iface: default, target: iqn. 1991-05.com.microsoft:atlante-vm-shared-disk-vm1-target, portal: 192.168.29.13,3260] Login to [iface: default, target: iqn.1991-05.com.microsoft:atlante-vm- shared-disk-vm1-target, portal: 192.168.29.13,3260]: successful but in /var/log/messages prints out: Apr 23 12:32:42 virtualserv1 kernel: scsi69 : iSCSI Initiator over TCP/ IP Apr 23 12:32:42 virtualserv1 kernel: connection4:0: detected conn error (1011) Apr 23 12:32:42 virtualserv1 iscsid: connection4:0 is operational now Apr 23 12:32:43 virtualserv1 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3) Apr 23 12:32:46 virtualserv1 kernel: session4: host reset succeeded Apr 23 12:32:46 virtualserv1 iscsid: connection4:0 is operational after recovery (1 attempts) Apr 23 12:32:56 virtualserv1 kernel: 69:0:0:0: scsi: Device offlined - not ready after error recovery It looks the target is not liking something. Are there any target logs? You are damn right Mike, and the matter is quite embarassing... Actually, who set up the target forgotten to added an available virtual disk. So, open-iscsi logged on the target but no device was detected. I've required the necessary changes in SAN/NAS configuration and now all work fine. Best regards, DaMn. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.
Konrad Rzeszutek wrote: Firstly, I haven't dug in this yet but this is more of a call: have-you-seen-this-too? This is probably on the list 20 times :) When I reboot the machine without logging off from iSCSI targets I can hang the reboot sequence. This is with 869-rc4 userspace, SLES 10 SP2 Beta kernel, with a 869-rc4 kernels compiled out of tree. (With a SLES 10 SP2 Beta kernel, which has a back-port of 868-rc1, I get the same bug) I enabled the debugging in the kernel (DEBUG_SCSI) and added a dump_stack() in the iscsi_check_transport_timeouts, and this is what I get: The timer is still running because the session is. iscsi: Sending nopout as ping on conn 88007a0b8a50 iscsi: Setting next tmo 4294974247 iscsi: mtask deq [cid 0 itt 0xa06] iscsi: mgmtpdu [op 0x0 hdr-itt 0xa06 datalen 0] Sending SIGKILL to all processes. Please stand by while rebooting the system. md: stopping all md devices. Synchronizing SCSI cache for disk sdl: It looks like the network is off but the session is still running. We eventually get to the kernel shutoff here. Is your init script getting run? If not then run it. If you left the session on on purpose then you cannot turn the network off because the scsi layer will want to do its shutdown when the kernel is stopped. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.
It looks like the network is off but the session is still running. We eventually get to the kernel shutoff here. Is your init script getting run? If not then run it. If you left the session on on purpose then you cannot turn the network off because the scsi layer will want to do its shutdown when the kernel is stopped. Ah. Thanks for the explanation. The init script was run, but it didn't logoff of all the sessions (it would selectivly logoff instead of doing all of them). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---