Re: connection1:0: detected conn error (1011)

2014-07-24 Thread 木木夕
Thank you,
i have 6 eth ports on the initiator and 6 on the target.
the IP of each eth port is 
192.168.1.x,192.168.2.x,192.168.3.x.192.168.6.x
and each port on the target links one corresponding port on the initiator
does this matter?

在 2014年7月24日星期四UTC+8上午4时02分11秒,Mike Christie写道:

 Is this the same setup where you had multiple initiator nic ports and 
 iscsi target portals on the same subnet? If so, then check the 
 networking. Can you ping -I ethX to the iscsi target portal? If you run 
 tcpdump/wireshark while doing the read test, do you see IO going through 
 the correct ports. 

 What target is this with? 

 On 07/22/2014 09:42 PM, 木木夕 wrote: 
  hello everyone, 
  the iscsi initiator can login the iscsi target successfully, everything 
  looks well 
  but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it 
  will print 
  connection1:0: detected conn error (1011)  
  it happened many times 
  any reply will be welcome 
  
  -- 
  You received this message because you are subscribed to the Google 
  Groups open-iscsi group. 
  To unsubscribe from this group and stop receiving emails from it, send 
  an email to open-iscsi+...@googlegroups.com javascript: 
  mailto:open-iscsi+unsubscr...@googlegroups.com javascript:. 
  To post to this group, send email to open-...@googlegroups.com 
 javascript: 
  mailto:open-...@googlegroups.com javascript:. 
  Visit this group at http://groups.google.com/group/open-iscsi. 
  For more options, visit https://groups.google.com/d/optout. 



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: connection1:0: detected conn error (1011)

2014-07-24 Thread Mike Christie
If they are on the different subnets then it should be ok. It is common
to hit network/iscsi setup issues when doing all ports on the same subnet.

So send the /var/log/messages of the initiator system. Again, what
target are you using? And a tcpdump/wireshark trace would be helpful
probably.

On 07/24/2014 03:46 AM, 木木夕 wrote:
 Thank you,
 i have 6 eth ports on the initiator and 6 on the target.
 the IP of each eth port is
 192.168.1.x,192.168.2.x,192.168.3.x.192.168.6.x
 and each port on the target links one corresponding port on the initiator
 does this matter?
 
 在 2014年7月24日星期四UTC+8上午4时02分11秒,Mike Christie写道:
 
 Is this the same setup where you had multiple initiator nic ports and
 iscsi target portals on the same subnet? If so, then check the
 networking. Can you ping -I ethX to the iscsi target portal? If you run
 tcpdump/wireshark while doing the read test, do you see IO going
 through
 the correct ports.
 
 What target is this with?
 
 On 07/22/2014 09:42 PM, 木木夕 wrote:
  hello everyone,
  the iscsi initiator can login the iscsi target successfully,
 everything
  looks well
  but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it
  will print
  connection1:0: detected conn error (1011) 
  it happened many times
  any reply will be welcome
 
  --
  You received this message because you are subscribed to the Google
  Groups open-iscsi group.
  To unsubscribe from this group and stop receiving emails from it,
 send
  an email to open-iscsi+...@googlegroups.com javascript:
  mailto:open-iscsi+unsubscr...@googlegroups.com javascript:.
  To post to this group, send email to open-...@googlegroups.com
 javascript:
  mailto:open-...@googlegroups.com javascript:.
  Visit this group at http://groups.google.com/group/open-iscsi
 http://groups.google.com/group/open-iscsi.
  For more options, visit https://groups.google.com/d/optout
 https://groups.google.com/d/optout.
 
 -- 
 You received this message because you are subscribed to the Google
 Groups open-iscsi group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to open-iscsi+unsubscr...@googlegroups.com
 mailto:open-iscsi+unsubscr...@googlegroups.com.
 To post to this group, send email to open-iscsi@googlegroups.com
 mailto:open-iscsi@googlegroups.com.
 Visit this group at http://groups.google.com/group/open-iscsi.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


connection1:0: detected conn error (1011)

2014-07-22 Thread 木木夕
hello everyone,
the iscsi initiator can login the iscsi target successfully, everything 
looks well
but when i start to read I/O(dd if=/dev/sdb of=/dev/null bs=512k), it will 
print
connection1:0: detected conn error (1011) 
it happened many times
any reply will be welcome

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: connection1:0: ping timeout of 5 secs expired, recv timeout 5 / connection1:0: detected conn error (1011)

2010-12-14 Thread Mike Christie

On 12/14/2010 03:12 PM, p...@fhri.org wrote:

Hi all...

I have four CentOS 5.4 (2.6.18-164.11.1.el5) servers with iscsid version
2.0-871.  Two are misbehaving despite identical configuration.  They all
connect to Enhance Tech RS8-IP4 array the same way, directly NIC-to-NIC
without a switch, physically separate from LAN.  I created four targets,
one per port, and four separate volumes/LUNs.

Pasted below is the config and error log.  About a minute after a
successful login, the timeouts/errors begin and keep coming constantly
pretty much every minute whenever the session is logged in, regardless
of mount state.  The problematic units are also often very slow logging
in, mounting, even directory listing at times.  Also, they sometimes
time out and remount the fs read-only in the middle of a large backup
run.



There were some fixes to that code in rhel/centos 5.5 kernel, but I do 
not think that is what you are hitting.


Do you see those ping/nop timeout messages even when you are not doing 
any IO intensive workload?


Did you setup your initiator names (/etc/iscsi/initiatorname.iscsi) or 
did you let the tools do this? Does each server have a unique initiator 
name or do some servers have the same value in that file?


On the target are there any log messsages?


If you set

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

(either set that in iscsid.conf then rerun the discovery command and 
relogin or run


iscsiadm -m node -o update -n
node.conn[0].timeo.noop_out_interval -v 0

iscsiadm -m node -o update -n
node.conn[0].timeo.noop_out_timeout -v 0
then relogin)

this will turn off the iscsi nops/pings. Then if run mkfs and do 
backups, you should not see the ping timeout messages, but do you see 
low throughout still? Do you still see conn error 1011 messages but 
just missing the ping timeout messages?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: Re: detected conn error (1011)

2010-09-08 Thread Ulrich Windl
Hallo,

uns fiel gestern ein Controller am SAN-Storage aus, und open-iscsi auf SLES10
SP3 (open-iscsi-2.0.868-0.6.11) erzeugte _sehr_ viele (419869 Einträge in
wenigen Stunden) Fehlermeldungen in Syslog. Gat Novell Bestrebungen, diese zu
drosseln? Beispiel:
Sep  7 16:08:23 hostname kernel:  connection19:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection31:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection20:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection32:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection23:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection7:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection24:0: iscsi: detected conn error
(1011)
Sep  7 16:08:23 hostname kernel:  connection8:0: iscsi: detected conn error
(1011)
[...]
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:24 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:25 hostname iscsid: conn 0 login rejected: target error (03/01)
[...]
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname iscsid: conn 0 login rejected: target error (03/01)
Sep  7 16:08:53 hostname kernel:  session19: iscsi: session recovery timed out
after 30 secs
Sep  7 16:08:53 hostname kernel:  session31: iscsi: session recovery timed out
after 30 secs
Sep  7 16:08:53 hostname kernel:  session20: iscsi: session recovery timed out
after 30 secs
[...]
Sep  7 16:08:54 hostname kernel: device-mapper: multipath: Failing path
8:192.
Sep  7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
14
Sep  7 16:08:54 hostname multipathd: sdac: tur checker reports path is down
Sep  7 16:08:54 hostname multipathd: checker failed path 65:192 in map
L116_hostas03
Sep  7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
13
Sep  7 16:08:54 hostname multipathd: sdv: tur checker reports path is down
Sep  7 16:08:54 hostname multipathd: checker failed path 65:80 in map
L116_hostas03
Sep  7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
12
Sep  7 16:08:54 hostname multipathd: sdx: tur checker reports path is down
Sep  7 16:08:54 hostname multipathd: checker failed path 65:112 in map
L116_hostas03
Sep  7 16:08:54 hostname multipathd: L116_hostas03: remaining active paths:
11
Sep  7 16:08:54 hostname multipathd: sdab: tur checker reports path is down
Sep  7 16:08:54 hostname multipathd: checker failed path 65:176 in map
L116_hostas03
[...]
Sep  7 16:08:54 hostname multipathd

Antw: Re: detected conn error (1011)

2010-09-08 Thread Ulrich Windl
Sorry,

this message was intended for someone else. E-Mail program tricked me. -- Ulrich

 Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 08.09.2010 um
13:51 in Nachricht 4c8794da02a1f...@gwsmtp1.uni-regensburg.de:


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-09-02 Thread Hannes Reinecke
On Thu, Sep 02, 2010 at 03:15:31PM -0700, Shantanu Mehendale wrote:
 Hi Hannes/Mike,
 
 I am also dealing with another issue on ISCSI transport  where I am
 seeing DID_TRASNPORT_FAILFAST hostbyte errors reaching the application
 which is sending I/O on a device-mapper node. Reading the code a little
 I thought that after the iscsi  replacement_timeout timer fires, the io
 stuck in the io queues will be sent up to the device-mapper, which
 would send the io to the new path. Is there a possibility that
 dm-multipath is not able to handle all the errors so some of them end
 up going to the application. Basically this is a cable pull kind of
 experiment where we would expect the path failover to work and io to
 continue properly.
Yes, in general it should. And yes, multipath should handle these cases.
But I did quite some patches to iSCSI in SLES11, so you should be making
sure you're using the latest maintenance release.

 Since we already saw one problem with DID_TRANSPORT_DISRUPTED, I was
 wondering if DID_TRANSPORT_FAILFAST also has some similar issues with
 limited retries and such.
 
No, that's actually okay. The I/O error will be reported in either case,
it's just that it'll never reaches the upper layers.

In your case it looks as if the 'tapdisk' thing runs on the raw disks,
not the multipathed device. So of course it'll register the error.
Maybe it's an idea to have the 'tapdisk' run on the multipath device-mapper
device ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



RE: detected conn error (1011)

2010-08-31 Thread Goncalo Gomes
Hi Hannes,

Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 
tree of SLES 11. We add a few extra patches specific to Xen,  dom0 integration 
and some backports from upstream. To the best of my knowledge these additions 
don't touch the iscsi layer, so from the iscsi drivers point of view, I believe 
they are as pristine as the ones in the SuSE kernel and that's why we need the 
patch as the binaries probably will mismatch gcc version and/or the versioning 
that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely 
appreciate your 'forward thinking' with regards to the issue, though!

Thanks,
 -Goncalo.



-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: 30 August 2010 15:12
To: Goncalo Gomes
Cc: Mike Christie; open-iscsi@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Goncalo Gomes wrote:
 Hi,
 
 On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: 
 Mike Christie wrote:
 ccing Hannes from suse, because this looks like a SLES only bug.

 Hey Hannes,

 The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
 running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
 is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.


 On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
 I've copied both the messages file from the host goncalog140 and the
 patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
 files in the link below:

 http://promisc.org/iscsi/

 It looks like this chunk from libiscsi.c:iscsi_queuecommand:

 case ISCSI_STATE_FAILED:
 reason = FAILURE_SESSION_FAILED;
 sc-result = DID_TRANSPORT_DISRUPTED  16;
 break;

 is causing IO errors.

 You want to use something like DID_IMM_RETRY because it can be a long
 time between the time the kernel marks the state as ISCSI_STATE_FAILED
 until we start recovery and properly get all the device queues blocked,
 so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
 Yeah, I noticed.
 But the problem is that multipathing will stall during this time,
 ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
 will circumvent this and we can failover immediately.

 Sadly I got additional bugreports about this so I think I'll have
 to revert it.
 
 I applied and tested the changes Mike Christie suggests. After the LUN
 is rebalanced within the array I no longer see the IO errors and it
 appears the setup is now resilient to the equallogic LUN failover
 process.
 
 I'm attaching the log from the dmesg merely for sanity check purposes,
 if anyone cares to take a look?
 
 I have put some test kernels at

 http://beta.suse.com/private/hare/sles11/iscsi
 
 Do the test kernels in the url above contain the change of
 DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than
 simply changing the result code? If the latter, would you be able to
 upload the source rpms or a unified patch containing the changes you are
 are staging? I'm looking for a more pallatable way to test them, given I
 have no SLES box lying around, but will install one if needs be.
 
Got me confused. How would you test the patch if not on a SLES box?
Presumably you would have to install the new kernel on the instance
you are planning to run the test on. Which for any sane setup would
have to be a SLES box. In which case you can just use the provided
kernel directly and save you the compilation step.

Am I missing something?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-31 Thread Hannes Reinecke
Goncalo Gomes wrote:
 Hi Hannes,
 
 Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 
 tree of SLES 11.
 We add a few extra patches specific to Xen,  dom0 integration and some 
 backports from upstream.
 To the best of my knowledge these additions don't touch the iscsi layer, so 
 from the iscsi
 drivers point of view, I believe they are as pristine as the ones in the SuSE 
 kernel and that's
 why we need the patch as the binaries probably will mismatch gcc version 
 and/or the versioning
 that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely 
 appreciate your
 'forward thinking' with regards to the issue, though!
 
I just checked, and the resulting patch is indeed like you proposed:

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 32b30f1..441ca8b 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1336,9 +1336,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(
struct scsi_cmnd *))
 */
switch (session-state) {
case ISCSI_STATE_FAILED:
-   reason = FAILURE_SESSION_FAILED;
-   sc-result = DID_TRANSPORT_DISRUPTED  16;
-   break;
case ISCSI_STATE_IN_RECOVERY:
reason = FAILURE_SESSION_IN_RECOVERY;
sc-result = DID_IMM_RETRY  16;

HTH,

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



RE: detected conn error (1011)

2010-08-31 Thread Goncalo Gomes
Thanks Hannes and Mike,

Your help has been highly appreciated!

Cheers,
 -Goncalo.

-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: 31 August 2010 14:43
To: Goncalo Gomes
Cc: Mike Christie; open-iscsi@googlegroups.com; Shantanu Mehendale
Subject: Re: detected conn error (1011)

Goncalo Gomes wrote:
 Hi Hannes,
 
 Thanks. The Citrix XenServer 5.6 distribution kernel is based on the 2.6.27 
 tree of SLES 11.
 We add a few extra patches specific to Xen,  dom0 integration and some 
 backports from upstream.
 To the best of my knowledge these additions don't touch the iscsi layer, so 
 from the iscsi
 drivers point of view, I believe they are as pristine as the ones in the SuSE 
 kernel and that's
 why we need the patch as the binaries probably will mismatch gcc version 
 and/or the versioning
 that we use e.g 2.6.27.42-0.1.1.xs5.6.0.44.58xen. I do definitely 
 appreciate your
 'forward thinking' with regards to the issue, though!
 
I just checked, and the resulting patch is indeed like you proposed:

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 32b30f1..441ca8b 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1336,9 +1336,6 @@ int iscsi_queuecommand(struct scsi_cmnd *sc, void (*done)(
struct scsi_cmnd *))
 */
switch (session-state) {
case ISCSI_STATE_FAILED:
-   reason = FAILURE_SESSION_FAILED;
-   sc-result = DID_TRANSPORT_DISRUPTED  16;
-   break;
case ISCSI_STATE_IN_RECOVERY:
reason = FAILURE_SESSION_IN_RECOVERY;
sc-result = DID_IMM_RETRY  16;

HTH,

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-30 Thread Hannes Reinecke
Goncalo Gomes wrote:
 Hi,
 
 On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: 
 Mike Christie wrote:
 ccing Hannes from suse, because this looks like a SLES only bug.

 Hey Hannes,

 The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
 running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
 is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.


 On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
 I've copied both the messages file from the host goncalog140 and the
 patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
 files in the link below:

 http://promisc.org/iscsi/

 It looks like this chunk from libiscsi.c:iscsi_queuecommand:

 case ISCSI_STATE_FAILED:
 reason = FAILURE_SESSION_FAILED;
 sc-result = DID_TRANSPORT_DISRUPTED  16;
 break;

 is causing IO errors.

 You want to use something like DID_IMM_RETRY because it can be a long
 time between the time the kernel marks the state as ISCSI_STATE_FAILED
 until we start recovery and properly get all the device queues blocked,
 so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
 Yeah, I noticed.
 But the problem is that multipathing will stall during this time,
 ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
 will circumvent this and we can failover immediately.

 Sadly I got additional bugreports about this so I think I'll have
 to revert it.
 
 I applied and tested the changes Mike Christie suggests. After the LUN
 is rebalanced within the array I no longer see the IO errors and it
 appears the setup is now resilient to the equallogic LUN failover
 process.
 
 I'm attaching the log from the dmesg merely for sanity check purposes,
 if anyone cares to take a look?
 
 I have put some test kernels at

 http://beta.suse.com/private/hare/sles11/iscsi
 
 Do the test kernels in the url above contain the change of
 DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than
 simply changing the result code? If the latter, would you be able to
 upload the source rpms or a unified patch containing the changes you are
 are staging? I'm looking for a more pallatable way to test them, given I
 have no SLES box lying around, but will install one if needs be.
 
Got me confused. How would you test the patch if not on a SLES box?
Presumably you would have to install the new kernel on the instance
you are planning to run the test on. Which for any sane setup would
have to be a SLES box. In which case you can just use the provided
kernel directly and save you the compilation step.

Am I missing something?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-24 Thread Goncalo Gomes
Hi,

On Fri, 2010-08-06 at 15:57 +0100, Hannes Reinecke wrote: 
 Mike Christie wrote:
  ccing Hannes from suse, because this looks like a SLES only bug.
  
  Hey Hannes,
  
  The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
  running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
  is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
  
  
  On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
  I've copied both the messages file from the host goncalog140 and the
  patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
  files in the link below:
 
  http://promisc.org/iscsi/
 
  
  It looks like this chunk from libiscsi.c:iscsi_queuecommand:
  
  case ISCSI_STATE_FAILED:
  reason = FAILURE_SESSION_FAILED;
  sc-result = DID_TRANSPORT_DISRUPTED  16;
  break;
  
  is causing IO errors.
  
  You want to use something like DID_IMM_RETRY because it can be a long
  time between the time the kernel marks the state as ISCSI_STATE_FAILED
  until we start recovery and properly get all the device queues blocked,
  so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
 Yeah, I noticed.
 But the problem is that multipathing will stall during this time,
 ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
 will circumvent this and we can failover immediately.
 
 Sadly I got additional bugreports about this so I think I'll have
 to revert it.

I applied and tested the changes Mike Christie suggests. After the LUN
is rebalanced within the array I no longer see the IO errors and it
appears the setup is now resilient to the equallogic LUN failover
process.

I'm attaching the log from the dmesg merely for sanity check purposes,
if anyone cares to take a look?

 I have put some test kernels at
 
 http://beta.suse.com/private/hare/sles11/iscsi

Do the test kernels in the url above contain the change of
DID_TRANSPORT_DISRUPTED to DID_DIMM_RETRY or is there more to it than
simply changing the result code? If the latter, would you be able to
upload the source rpms or a unified patch containing the changes you are
are staging? I'm looking for a more pallatable way to test them, given I
have no SLES box lying around, but will install one if needs be.

Thanks,
-Goncalo.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

device-mapper: multipath: version 1.0.5 loaded
device-mapper: multipath round-robin: version 1.0.0 loaded
device-mapper: table: 251:1: multipath: error getting device
device-mapper: ioctl: error adding target to table
device-mapper: table: 251:1: multipath: error getting device
device-mapper: ioctl: error adding target to table
Citrix Systems, Inc. -- Private Release Kernel
Private File Disclaimer The private files provided to you contain a preliminary 
code fix. These private files have been created and distributed to you to 
address your specific issue and provide Citrix with the feedback that your 
issue has been resolved or to provide further debugging information. These 
private files have had minimal in-house testing with no regression testing and 
may contain defects.  These private file(s) will only be supported until an 
official Hotfix has been provided or one is publicly available from the Citrix 
web site. Any private files that are provided to you are intended only for the 
use of the individual or entity to which this is addressed and distribution of 
these files or utilities is prohibited. CITRIX MAKES NO REPRESENTATIONS OR 
WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR 
PURPOSE WITH RESPECT TO THE PRIVATE FILES.  THE PRIVATE FILES ARE DELIVERED ON 
AN AS IS BASIS. YOU SHALL HAVE THE SOLE RESPONSIBILITY FOR ADEQUATE 
PROTECTION AND BACK-UP OF AN6Loading iSCSI transport class v2.0-870.
iscsi: registered transport (tcp)
scsi6 : iSCSI Initiator over TCP/IP
 connection1:0: detected conn error (1011)
scsi 6:0:0:0: Direct-Access EQLOGIC  100E-00  4.3  PQ: 0 ANSI: 5
sd 6:0:0:0: [sdb] 209725440 512-byte hardware sectors: (107 GB/100 GiB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Mode Sense: ad 00 00 00
sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support 
DPO or FUA
sd 6:0:0:0: [sdb] 209725440 512-byte hardware sectors: (107 GB/100 GiB)
sd 6:0:0:0: [sdb] Write Protect is off
sd 6:0:0:0: [sdb] Mode Sense: ad 00 00 00
sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support 
DPO or FUA
 sdb: sdb1
sd 6:0:0:0: [sdb] Attached SCSI disk
sd 6:0:0:0: Attached scsi generic sg1 type 0
tap_backend_changed: backend/tap/1/51712: created thread 9531
tap_blkif_schedule[9531]: starting
device

RE: detected conn error (1011)

2010-08-06 Thread Goncalo Gomes
Hi Hannes,

Would you be able to send me a unified patch containing the changes included in 
the test kernels so I can rebuild the drivers with them and update you today?

For completeness, we are not running SLES, but rather the Citrix XenServer 5.6 
release which is based off of the Linux 2.6.27 tree of SLES. Also, for this 
specific controller we don't enable MPIO, but in most other arrays we do.

Thanks,
 -Goncalo.

-Original Message-
From: Hannes Reinecke [mailto:h...@suse.de] 
Sent: 06 August 2010 15:58
To: Mike Christie
Cc: open-iscsi@googlegroups.com; Goncalo Gomes
Subject: Re: detected conn error (1011)

Mike Christie wrote:
 ccing Hannes from suse, because this looks like a SLES only bug.
 
 Hey Hannes,
 
 The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
 running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
 is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
 
 
 On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
 I've copied both the messages file from the host goncalog140 and the
 patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
 files in the link below:

 http://promisc.org/iscsi/

 
 It looks like this chunk from libiscsi.c:iscsi_queuecommand:
 
 case ISCSI_STATE_FAILED:
 reason = FAILURE_SESSION_FAILED;
 sc-result = DID_TRANSPORT_DISRUPTED  16;
 break;
 
 is causing IO errors.
 
 You want to use something like DID_IMM_RETRY because it can be a long
 time between the time the kernel marks the state as ISCSI_STATE_FAILED
 until we start recovery and properly get all the device queues blocked,
 so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
Yeah, I noticed.
But the problem is that multipathing will stall during this time,
ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
will circumvent this and we can failover immediately.

Sadly I got additional bugreports about this so I think I'll have
to revert it.

I have put some test kernels at

http://beta.suse.com/private/hare/sles11/iscsi

Can you test with them and check if this issue is solved?

Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-06 Thread Mike Christie

On 08/06/2010 09:57 AM, Hannes Reinecke wrote:

Mike Christie wrote:

ccing Hannes from suse, because this looks like a SLES only bug.

Hey Hannes,

The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.


On 08/05/2010 02:21 PM, Goncalo Gomes wrote:

I've copied both the messages file from the host goncalog140 and the
patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
files in the link below:

http://promisc.org/iscsi/



It looks like this chunk from libiscsi.c:iscsi_queuecommand:

 case ISCSI_STATE_FAILED:
 reason = FAILURE_SESSION_FAILED;
 sc-result = DID_TRANSPORT_DISRUPTED  16;
 break;

is causing IO errors.

You want to use something like DID_IMM_RETRY because it can be a long
time between the time the kernel marks the state as ISCSI_STATE_FAILED
until we start recovery and properly get all the device queues blocked,
so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.

Yeah, I noticed.
But the problem is that multipathing will stall during this time,
ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
will circumvent this and we can failover immediately.



It should stall, It works like FC and the fast io fail tmo. Users need 
to set the iscsi replacement/recovery timeout like they would FC's fast 
io fail tmo. They should set it to 3 or 5 secs or lower if they want 
really fast failovers.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-06 Thread Mike Christie

On 08/06/2010 11:38 AM, Mike Christie wrote:


It should stall, It works like FC and the fast io fail tmo. Users need
to set the iscsi replacement/recovery timeout like they would FC's fast
io fail tmo. They should set it to 3 or 5 secs or lower if they want
really fast failovers.



Oh yeah, Qlogic recently did this patch:

http://git.kernel.org/?p=linux/kernel/git/jejb/scsi-misc-2.6.git;a=commitdiff;h=fe4f0bdeea788a8ac049c097895cb2e4044f18b1;hp=caf19d38607108304cd8cc67ed21378017f69e8a

so we can have multipath tools set the recovery_tmo like it does for 
fast io fail tmo.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: detected conn error (1011)

2010-08-05 Thread Ulrich Windl
 Goncalo Gomes goncalo.go...@eu.citrix.com schrieb am 04.08.2010 um 23:12 
 in
Nachricht
ffdb98dc9661d3418b9eb0ff5202e46b7a82e8b...@lonpmailbox01.citrite.net:
 I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as 
 dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs 
 is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.

I guess that's SLES11 already. I just read an announcement that there is an 
opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not give 
any details in the announcements:

4. Recommended update for open-iscsi

   SUSE Linux Enterprise Desktop 11 SP1 for x86-64
   http://download.novell.com/Download?buildid=MAugs_l2FJY~ 

   SUSE Linux Enterprise Desktop 11 SP1 for x86
   http://download.novell.com/Download?buildid=U2OyI_9oJ5g~ 

   SUSE Linux Enterprise Server 11 SP1 for x86-64
   http://download.novell.com/Download?buildid=1Z1WASv0lfE~ 

   SUSE Linux Enterprise Server 11 SP1 for x86
   http://download.novell.com/Download?buildid=EzU17PIvOTc~ 

   SUSE Linux Enterprise Server 11 SP1 for s390x
   http://download.novell.com/Download?buildid=xqwCozVDBjM~ 

   SUSE Linux Enterprise Server 11 SP1 for ppc
   http://download.novell.com/Download?buildid=fMD_W5XKEtI~ 

   SUSE Linux Enterprise Server 11 SP1 for ia64
   http://download.novell.com/Download?buildid=_QVtGS0824o~ 


Maybe you should try the latest (and greatest?) version... ;-)

Regards,
Ulrich


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Antw: detected conn error (1011)

2010-08-05 Thread TSFH
On Thu, 2010-08-05 at 08:50 +0200, Ulrich Windl wrote:
 I guess that's SLES11 already. I just read an announcement that there is an 
 opne-iscsi update for SLES11 SP1 available. Unfortunately Novell does not 
 give any details in the announcements:
 
 4. Recommended update for open-iscsi
 
SUSE Linux Enterprise Desktop 11 SP1 for x86-64
http://download.novell.com/Download?buildid=MAugs_l2FJY~ 
 
SUSE Linux Enterprise Desktop 11 SP1 for x86
http://download.novell.com/Download?buildid=U2OyI_9oJ5g~ 
 
SUSE Linux Enterprise Server 11 SP1 for x86-64
http://download.novell.com/Download?buildid=1Z1WASv0lfE~ 
 
SUSE Linux Enterprise Server 11 SP1 for x86
http://download.novell.com/Download?buildid=EzU17PIvOTc~ 
 
SUSE Linux Enterprise Server 11 SP1 for s390x
http://download.novell.com/Download?buildid=xqwCozVDBjM~ 
 
SUSE Linux Enterprise Server 11 SP1 for ppc
http://download.novell.com/Download?buildid=fMD_W5XKEtI~ 
 
SUSE Linux Enterprise Server 11 SP1 for ia64
http://download.novell.com/Download?buildid=_QVtGS0824o~ 
 
 
 Maybe you should try the latest (and greatest?) version... ;-)

Fron the description:

  * Occasionally, not all iSCSI multipath mapping are being created
after boot up
  * Stopping of the open-iscsi service fails even if no iSCSI device
is mounted.
  * When configuring iBFT, the iscsiadm program does not display the
details of a session.

Do you think any of the fixes above may help in the issue I described
before? I'm presently not making use of multipath nor booting from SAN. 

Although, these fixes are worth having, I'm mostly concerned about
understanding the nature/reason of the issue I described before at this
stage.

Thanks,
-Goncalo.


 Regards,
 Ulrich
 
 

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-05 Thread Goncalo Gomes
On Wed, 2010-08-04 at 21:51 -0500, Mike Christie wrote:
 conn error 1011 is generic. If this is occurring when the eql box is 
 rebalancing luns, it is a little different than above. With the above 
 problem we did not know why we got the error. With your situation we 
 sort of expect this. We should not be getting disk IO errors though.
 
 When we get the logout request from the target, we send the logout 
 request, then basically handle the cleanup like if we got a connection 
 error. That is why you would see the conn error msg in this path. This 
 also means if this happened to the same IO 5 times, then you would see 
 the disk IO errors (scsi layer only lets us retry disk IO 5 times). But 
 if it just happened once, then the IO should be retried when we log into 
 the new portal and execute like normal.

What would be the best way to I identify how many retries have elapsed?

 Or are you using dm-multipath over iscsi? In that case you do not get 
 any retries, so we would expect to see that end_request: I/O error 
 message, but dm-multipath should just be retrying a new path or 
 internally queueing for whatever timeout value you had it use in 
 multipath.conf.

Multipath is not enabled at all. The equallogic array is active/passive
and we only have a view into one controller at any time, so we don't
make use of multipath at present.

 Could you send me the libiscsi.c file you patched?
 
 Could you also send more of the log for either case? I want to see the 
 iscsid log info and any more of the kernel iscsi log info that you have. 
 I am looking for session recovery timed out messages and/or target 
 requested logout messages.

I've copied both the messages file from the host goncalog140 and the
patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
files in the link below:

http://promisc.org/iscsi/

N.B: the messages file contains spew from other instrumentation tests
(e.g a dump_stack() call in scsi_transport_iscsi.c::iscsi_conn_error()).
The last set of tests which I've made available yesterday have only the
libiscsi.c and IIRC the iscsi_tcp.c, and this output can be found around
the timeframe of 17:50.

If required I can spin a new set of tests with different instrumentation
and/or collect different information, logs or tcpdumps, if that helps in
any way.

Thanks,
 -Goncalo.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



detected conn error (1011)

2010-08-04 Thread Goncalo Gomes
I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as 
dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is 
iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.



Whenever the equallogic rebalances the LUNs between the controllers/ports, it 
requests the initiator to logout and login again to the new port/ip. If the 
guests are idle, the following messages show up in the logs:



Aug  3 17:55:08 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:09 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



However, if one of the RHEL guests is busy performing IO, we end up having a 
few failed requests as well:



Aug  3 17:55:26 goncalog140 kernel:  connection1:0: dropping R2T itt 55 in 
recovery.

Aug  3 17:55:26 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
533399

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
5337 51

Aug  3 17:55:27 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



And as a side effect, the guest filesystem goes read-only. Googling around, 
I've found the following thread on this list which covers the same error I'm 
seeing in the logs:



http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gstq=conn+error#8e95febb6cf79f64



I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike 
Christie taken from that thread which can be found in the link below:



http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2view=1



Is this a known issue? Is there anything else from a troubleshooting 
perspective that I could do?



I've uploaded the following files, in case someone would like to take a look:



Tcpdump's collected a couple of days ago in another reproduction/analysis of 
the same bug (apologies, but I didn't get around to collect new tcp dumps with 
today's reproduction):



0tcpdump0947.pcap   162K  - 09:47 (GMT+1) nothing occurred.

1tcpdump0952.pcap   4.8M  - 09:52 (GMT+2) problem occurred



Logs from today's reproduction of the issue with the patched drivers for 
additional backtracing:



vm-boot.txt2.7K After VM creation

vm-lun-rebalance-no-effect.txt 3.1K VM is idling, FS does not become 
read-only.

vm-lun-rebalance-fs-readonly.txt   3.3K VM is dd'ing /dev/zero to iscsi based 
disk, FS becomes read-only.

guest-dmesg.txt14K  RHEL 5.3 with 2.6.18-194.8.1.el5xen 
(RHEL 5.5 kernel)



All these files can be found in the following link:



http://promisc.org/iscsi/



Any help would be greatly appreciated!



Cheers,

 -Goncalo.




-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: detected conn error (1011)

2010-08-04 Thread Mike Christie

On 08/04/2010 04:12 PM, Goncalo Gomes wrote:

I'm running a setup composed of: Linux 2.6.27 x86 based on SLES + Xen 3.4 (as 
dom0) running a couple of RHEL 5.5 VMs. The underlying storage for these VMs is 
iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.



Whenever the equallogic rebalances the LUNs between the controllers/ports, it 
requests the initiator to logout and login again to the new port/ip. If the 
guests are idle, the following messages show up in the logs:



Aug  3 17:55:08 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:09 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:10 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



However, if one of the RHEL guests is busy performing IO, we end up having a 
few failed requests as well:



Aug  3 17:55:26 goncalog140 kernel:  connection1:0: dropping R2T itt 55 in 
recovery.

Aug  3 17:55:26 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
533399

Aug  3 17:55:26 goncalog140 kernel: sd 6:0:0:0: [sdb] Result: 
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK,SUGGEST_OK

Aug  3 17:55:26 goncalog140 kernel: end_request: I/O error, dev sdb, sector 
5337 51

Aug  3 17:55:27 goncalog140 kernel:  connection1:0: detected conn error (1011)

Aug  3 17:55:29 goncalog140 iscsid: connection1:0 is operational after recovery 
(1 attempts)



And as a side effect, the guest filesystem goes read-only. Googling around, 
I've found the following thread on this list which covers the same error I'm 
seeing in the logs:



http://groups.google.com/group/open-iscsi/browse_thread/thread/3a7a5db6e5020423/8e95febb6cf79f64?lnk=gstq=conn+error#8e95febb6cf79f64



conn error 1011 is generic. If this is occurring when the eql box is 
rebalancing luns, it is a little different than above. With the above 
problem we did not know why we got the error. With your situation we 
sort of expect this. We should not be getting disk IO errors though.


When we get the logout request from the target, we send the logout 
request, then basically handle the cleanup like if we got a connection 
error. That is why you would see the conn error msg in this path. This 
also means if this happened to the same IO 5 times, then you would see 
the disk IO errors (scsi layer only lets us retry disk IO 5 times). But 
if it just happened once, then the IO should be retried when we log into 
the new portal and execute like normal.


Or are you using dm-multipath over iscsi? In that case you do not get 
any retries, so we would expect to see that end_request: I/O error 
message, but dm-multipath should just be retrying a new path or 
internally queueing for whatever timeout value you had it use in 
multipath.conf.







I've also compiled the drivers iscsi_tcp/libiscsi with the patch from Mike 
Christie taken from that thread which can be found in the link below:



http://groups.google.com/group/open-iscsi/attach/db552832995daaa7/trace-conn-error.patch?part=2view=1




Could you send me the libiscsi.c file you patched?

Could you also send more of the log for either case? I want to see the 
iscsid log info and any more of the kernel iscsi log info that you have. 
I am looking for session recovery timed out messages and/or target 
requested logout messages.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-28 Thread Ulrich Windl
  Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 
  28.07.2010 
 um
16:46 in Nachricht 4c505ef502a1e...@gwsmtp1.uni-regensburg.de
 :
  Sean S sstra...@gmail.com schrieb am 28.07.2010 um 16:34 in 
  Nachricht
 f711789b-c411-4459-afbe-fe1a50fe2...@w12g2000yqj.googlegroups.
  com:
  How did you get those other kernel messages? If you can just get the
   iscsid log info that is sent after lines like this
  
  I'm able to issue the dmesg command after the drive is lost and
  still retrieve some logging info. Unfortunately, what I sent was all
  that I can get. If the drive ever successfully reconnects then I can
  get to /var/log/messages and see the info you are looking for. I've
  only ever had a successful reconnect when intentionally causing a
  disconnect (i.e. pulling the ethernet cable and then reconnecting
  it).
  
  I don't know much about unix logging, but maybe there is a way to send
  more of the logging messages to dmesg as that doesn't appear to need
  disk access to be read.
 
 dmesg just print the kernel message buffer (/proc/kmsg), while syslog can 
 capture messages from applications as well.
 
 I have a sample for a syslog-ng configuration file:

Samples for sources are missing, sorry:
source s_intern { internal(); };
source s_dev_log { unix-stream(/dev/log); };
source s_kernel { file(/proc/kmsg); };


 
 destination d_tty_root { usertty(root); };
 destination d_console { file(/dev/ttyS0); };
 destination d_messages { file(/var/log/messages); };
 
 filter f_error {
 level(alert .. err) and not match('S15.modem: initchat failed.');
 };
 filter f_kernel { level(alert .. err); };
 filter f_auth { facility(auth, authpriv) and level(alert .. info); };
 filter f_debug { level(alert .. debug); };
 
 # send criticals messages to logged root user and /var/log/messages
 log {
 source(s_intern);
 source(s_dev_log);
 source(s_kernel);
 filter(f_error);
 destination(d_tty_root);
 destination(d_messages);
 };
 
 # save auth-related messages
 log {
 source(s_dev_log);
 source(s_kernel);
 filter(f_auth);
 destination(d_messages);
 };
 
 ### Just to get you started. The older syslog is less powerful, but easier 
 to configure.
 
 Maybe this is interesting for you:
 
 # 6) To send message to remote syslogd server :
 #destination d_udp { udp(remote IP address port(514)); };
 #Example to send syslogs to syslogd located at 10.0.0.1 :
 #   destination d_udp1 { udp(10.0.0.1 port(514)); };
 
 Maybe this helps a bit.
 
 Regards,
 Ulrich
 



 

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-28 Thread Mike Christie

On 07/28/2010 09:34 AM, Sean S wrote:



What version of open-iscsi-871 are you using is it 871.1 or .2 .3?

I downloaded the current semi-stable release:
http://www.open-iscsi.org/bits/open-iscsi-2.0-871.tar.gz
It doesn't appear to have a minor version number. Should I be using
something else?


Yeah, try:
http://kernel.org/pub/linux/kernel/people/mnc/open-iscsi/releases/open-iscsi-2.0-871.3.tar.gz

It has a fix for recovery. There was a problem where recovery hung for 
several minutes.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-27 Thread Ulrich Windl
  Sean S sstra...@gmail.com schrieb am 27.07.2010 um 00:37 in 
 Nachricht
109cc690-901c-4e53-9fa9-1a9903380...@l14g2000yql.googlegroups.
 com:
[...]
 I'm unable to view /var/log/messages after the failure due to running
 as iscsi root. Ulrich mentioned writing the log to a serial port, but
 I haven't been able to set this up yet. Would there be an easier way

When using GRUB, use something like
[...]
#serial --unit=0 --speed=19200
terminal serial console
[...]

and add options vga=normal console=tty0 console=ttyS0,19200 to the kernel 
command line, just like:
###Don't change this comment - YaST2 identifier: Original name: linux###
title SUSE Linux Enterprise Server 10 - 2.6.16.60-0.54.5 (smp)
root (hd0,0)
kernel /vmlinuz-2.6.16.60-0.54.5-smp root=/dev/system/root vga=normal 
console=tty0 console=ttyS0,19200 splash=silent showopts
initrd /initrd-2.6.16.60-0.54.5-smp

Ulrich


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-26 Thread Sean S
Thanks for the patch Mike. Below is the output from a failure when
running with the patch. Any thoughts?

[f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi]

[f9bf202d] iscsi_eh_abort+0x2f1/0x406 [libiscsi]

[f885d378] __scsi_try_to_abort_cmd+0x19/0x1a [scsi_mod]

[f885e85d] scsi_error_handler+0x24d/0x422 [scsi_mod]

[c041f7ea] complete+0x2b/0x3d

[f885e610] scsi_error_handler+0x0/0x422 [scsi_mod]

[c0435f65] kthread+0xc0/0xeb

[c0435ea5] kthread+0x0/0xeb

[c0405c3b] kernel_thread_helper+0x7/0x10

===

connection1:0 detected conn error (1011)

[f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi]

[f8bf22fc] iscsi_eh_target_reset+0xbb/0x218 [libiscsi]

[c0605967] _spin_lock_bh+0x8/0x18

[f8bf0f78] iscsi_eh_device_reset+0x1c5/0x1cf [libiscsi]

[c054a6dd] get_device+0xe/0x14

[f885d764] scsi_try_host_reset+0x3a/0x99 [scsi_mod]

[f885e0e3] scsi_eh_ready_devs+0x302/0x3e2 [scsi_mod]

[f885e8dd] scsi_error_handler+0x2cd/0x422 [scsi_mod]

[c041f7ea] complete+0x2b/0x3d

[f885e610] scsi_error_handler+0x0/0x422 [scsi_mod]

[c0435f65] kthread+0xc0/0xeb

[c0435ea5] kthread+0x0/0xeb

[c0405c3b] kernel_thread_helper+0x7/0x10

===

session1: session recovery timed out after 400 secs

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: scsi: Device offlined - no ready after error recovery

sd 0:0:0:0: SCSI error: return code = 0x0002

end_request: I/O error, dev sda, sector 14283149

On Jul 13, 10:34 pm, Mike Christie micha...@cs.wisc.edu wrote:
 Could you run with the attached patch? It just prints out a little more
 info. When we get the conn error, it will print out a message if it is
 due to the target dropping the connection and it will print out stack
 trace so we can see exactly what piece of code is throwing the error.

 On 07/13/2010 09:33 PM, Sean S wrote:

  Nothing else in the log from iscsid. No mention of a failed reconnect,
  although the only log I'm really able to access post failure is dmesg.
  Since I'm running a root iscsi, I couldn't get to /var/log/messages
  which maybe was a little more verbose? What sort of network problems

 Yeah, by default the iscsid messages go there. iscsid should be spitting
 out a cannot connect $some_error_value_or_string that would help tell us
 why we cannot reach the target anymore.

  might cause this? The network in this situation is a simple gigE
  switch with about 3 or 4 systems on it. The target and initiator are
  on the same subnet, nothing fancy. Is there some additional debug
  you'd recommend turning on? Any tips or tricks when running with a
  root iscsi drive?

 Not that I can think of at the iscsi layer.



  Curiously, if I physically disconnect the ethernet from the initiator
  while running, all I/O access is correctly paused without returning I/
  O errors. If I then reconnect before the 400s is up things go back to
  normal. I don't however see the detected conn error (1011) message
  in this situation however. Not sure if that really means anything.

 You should see the conn error 1011 message if

 1. you have nops on and they timeout and that causes us to log that error.

 2. the network layer figures out there is a problem and notifies us. It
 is possible that you pull a cable and plug it back in before the network
 throws an error.

 3. iscsi driver or protocol error. In this case we should relogin quickly.

  trace-conn-error.patch
 1KViewDownload

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-26 Thread Mike Christie

On 07/26/2010 04:36 PM, Sean S wrote:

Thanks for the patch Mike. Below is the output from a failure when
running with the patch. Any thoughts?

[f8bf0876] iscsi_conn_failure+0x10/0x69 [libiscsi]

[f9bf202d] iscsi_eh_abort+0x2f1/0x406 [libiscsi]

[f885d378] __scsi_try_to_abort_cmd+0x19/0x1a [scsi_mod]

[f885e85d] scsi_error_handler+0x24d/0x422 [scsi_mod]

[c041f7ea] complete+0x2b/0x3d

[f885e610] scsi_error_handler+0x0/0x422 [scsi_mod]

[c0435f65] kthread+0xc0/0xeb

[c0435ea5] kthread+0x0/0xeb

[c0405c3b] kernel_thread_helper+0x7/0x10



Each scsi command has a timeout (see /sys/block/sdX/device/timeout). The 
above dump shows that a scsi command is timing out. This causes the scsi 
layer to have the driver, iscsi_tcp in this case, to try and abort the 
command. It looks like the abort timed out too, and so the iscsi layer 
decided to escalate the eh and failed the iscsi session/connection.




session1: session recovery timed out after 400 secs


The iscsi layer tried to log back in for recovery/replacement timeout 
seconds, but could not.


Did you see anything from iscsid about why it could not log in? iscsid 
writes to /var/log/messages by default.





sd 0:0:0:0: scsi: Device offlined - no ready after error recovery



Because the replacement/recovery timeout fired, the iscsi layer decided 
it was time to give up and tells the scsi layer the disks are not 
recoverable, and so we these messages:




sd 0:0:0:0: scsi: Device offlined - no ready after error recovery



Does the session/connection ever re-login (you would see some message in 
/var/log/messages about connection X:Y is operational after recovery (Z 
attempts)?


On the target box check out /var/log/messages. Is the target even up 
still? Did it segfault?


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-14 Thread Ulrich Windl
 Sean S sstra...@gmail.com schrieb am 13.07.2010 um 20:41 in Nachricht
1f2389e7-9717-4f82-a05c-671f36a4c...@x21g2000yqa.googlegroups.com:
 I'm running an iscsi root partition for a CentOS machine running a
 2.6.18-53 kernel. Every couple of days I get the error:
 
 connection1:0 detected conn error (1011)
 session1: session recovery timed out after 400 sec

Hi!

I cannot answer your question, but that brings up something I wanted to talk 
about. Please apologize if something already exists, but I don't know:

In HP-UX 11.31 you can print scan times per device (i.e. LUN). Here's an 
example for a true FC-SAN:
Class I  H/W Path  ms_scan_time
===
lunpath   3  0/3/1/0.0x50001fe1500c1f28.0x0 0 min 0 sec 13 ms
lunpath  24  0/3/1/0.0x50001fe1500c1f28.0x4001  0 min 0 sec 88 ms
lunpath  73  0/3/1/0.0x50001fe1500c1f28.0x4002  0 min 0 sec 88 ms
lunpath  25  0/3/1/0.0x50001fe1500c1f28.0x4003  0 min 0 sec 88 ms
lunpath  74  0/3/1/0.0x50001fe1500c1f28.0x4009  0 min 0 sec 88 ms
lunpath  26  0/3/1/0.0x50001fe1500c1f28.0x4033  0 min 0 sec 88 ms
lunpath  88  0/3/1/0.0x50001fe1500c1f28.0x4037  0 min 0 sec 88 ms
lunpath  79  0/3/1/0.0x50001fe1500c1f28.0x403d  0 min 0 sec 91 ms
lunpath  27  0/3/1/0.0x50001fe1500c1f28.0x4047  0 min 0 sec 91 ms
[...]
lunpath  63  0/7/1/0.0x500308c001d83803.0x4001  0 min 0 sec 11 ms
lunpath  64  0/7/1/0.0x500308c001d83803.0x4002  0 min 0 sec 11 ms
lunpath  65  0/7/1/0.0x500308c001d83803.0x4003  0 min 0 sec 11 ms
lunpath  66  0/7/1/0.0x500308c001d83803.0x4004  0 min 0 sec 536 ms

If Linux/open-iscsi had something similar, one could periodically watch the 
times to find bottlenecks. AFAIK, the scan time in HP-UX is the round-trip 
delay for querying a LUN or a controller (a target?).

Ulrich


 
 I compiled the open-iscsi 2.0-871 user tools and kernel modules from
 source obtained from open-iscsi.org. I custom packaged the initrd to
 contain the iscsistart binary and the kernel modules from v871. I've
 zeroed out the noop timeout setting and the noop interval.
 
 The disconnect is not reproducible, but does occur at random about
 every other day. I'm assuming that the target (IET 1.4.19) is not the
 issue as a second system that is using the target as an iscsi-root
 drive continues to work correctly. What things should I be looking at?
 I'm really struggling to understand why this happens, any suggestions
 would be greatly appreciated.



 

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Antw: Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-14 Thread Ulrich Windl
 Sean S sstra...@gmail.com schrieb am 14.07.2010 um 04:33 in Nachricht
83cd8c40-2e84-4c52-a864-36643dd0a...@d8g2000yqf.googlegroups.com:
 Nothing else in the log from iscsid. No mention of a failed reconnect,
 although the only log I'm really able to access post failure is dmesg.
 Since I'm running a root iscsi, I couldn't get to /var/log/messages
 which maybe was a little more verbose? What sort of network problems
 might cause this? The network in this situation is a simple gigE

Remember that syslogd can also write the log to a terminal or serial line. For 
SUSE Linux it's on tty10 (Ctrl+Alt+F10), but not very verbose. You could try to 
set it up similar with more verbosity.

[...]

Ulrich


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Antw: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-14 Thread Mike Christie

On 07/14/2010 03:59 AM, Ulrich Windl wrote:

Sean Ssstra...@gmail.com  schrieb am 13.07.2010 um 20:41 in Nachricht

1f2389e7-9717-4f82-a05c-671f36a4c...@x21g2000yqa.googlegroups.com:

I'm running an iscsi root partition for a CentOS machine running a
2.6.18-53 kernel. Every couple of days I get the error:

connection1:0 detected conn error (1011)
session1: session recovery timed out after 400 sec


Hi!

I cannot answer your question, but that brings up something I wanted to talk 
about. Please apologize if something already exists, but I don't know:

In HP-UX 11.31 you can print scan times per device (i.e. LUN). Here's an 
example for a true FC-SAN:
Class I  H/W Path  ms_scan_time
===
lunpath   3  0/3/1/0.0x50001fe1500c1f28.0x0 0 min 0 sec 13 ms
lunpath  24  0/3/1/0.0x50001fe1500c1f28.0x4001  0 min 0 sec 88 ms
lunpath  73  0/3/1/0.0x50001fe1500c1f28.0x4002  0 min 0 sec 88 ms
lunpath  25  0/3/1/0.0x50001fe1500c1f28.0x4003  0 min 0 sec 88 ms
lunpath  74  0/3/1/0.0x50001fe1500c1f28.0x4009  0 min 0 sec 88 ms
lunpath  26  0/3/1/0.0x50001fe1500c1f28.0x4033  0 min 0 sec 88 ms
lunpath  88  0/3/1/0.0x50001fe1500c1f28.0x4037  0 min 0 sec 88 ms
lunpath  79  0/3/1/0.0x50001fe1500c1f28.0x403d  0 min 0 sec 91 ms
lunpath  27  0/3/1/0.0x50001fe1500c1f28.0x4047  0 min 0 sec 91 ms
[...]
lunpath  63  0/7/1/0.0x500308c001d83803.0x4001  0 min 0 sec 11 ms
lunpath  64  0/7/1/0.0x500308c001d83803.0x4002  0 min 0 sec 11 ms
lunpath  65  0/7/1/0.0x500308c001d83803.0x4003  0 min 0 sec 11 ms
lunpath  66  0/7/1/0.0x500308c001d83803.0x4004  0 min 0 sec 536 ms

If Linux/open-iscsi had something similar, one could periodically watch the times to find 
bottlenecks. AFAIK, the scan time in HP-UX is the round-trip delay for 
querying a LUN or a controller (a target?).



Did you want to find bottlenecks in the network or between the initiator 
and actual device or initiator and target?


Erez, was adding some code where it exports the iscsi nop/ping times. 
The nop/ping we send has a header of 48 bytes and no data payload. It 
does not have do any disk/device IO. So this is nice for testing the 
network.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-13 Thread Sean S
I'm running an iscsi root partition for a CentOS machine running a
2.6.18-53 kernel. Every couple of days I get the error:

connection1:0 detected conn error (1011)
session1: session recovery timed out after 400 sec

I compiled the open-iscsi 2.0-871 user tools and kernel modules from
source obtained from open-iscsi.org. I custom packaged the initrd to
contain the iscsistart binary and the kernel modules from v871. I've
zeroed out the noop timeout setting and the noop interval.

The disconnect is not reproducible, but does occur at random about
every other day. I'm assuming that the target (IET 1.4.19) is not the
issue as a second system that is using the target as an iscsi-root
drive continues to work correctly. What things should I be looking at?
I'm really struggling to understand why this happens, any suggestions
would be greatly appreciated.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-13 Thread Mike Christie

On 07/13/2010 01:41 PM, Sean S wrote:

I'm running an iscsi root partition for a CentOS machine running a
2.6.18-53 kernel. Every couple of days I get the error:

connection1:0 detected conn error (1011)
session1: session recovery timed out after 400 sec



Is there anything more to the log? Is there anything from iscsid? 
Something about not being able to connect/reconnect to the target?


If you just see that, then it means there was some connection problem. 
We do not know exactly what it was, but we disconnected the connection, 
then tried to reconnect. We tried to reconnect for 400 seconds but could 
not, so at that point we mark the session as bad and start to fail IO 
until we can log back in.


It is normally due to a problem in the network if the target is ok.

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: connection1:0 detected conn error (1011) open-iscsi 2.0-871 w/ CentOS 2.6.18-53

2010-07-13 Thread Sean S
Nothing else in the log from iscsid. No mention of a failed reconnect,
although the only log I'm really able to access post failure is dmesg.
Since I'm running a root iscsi, I couldn't get to /var/log/messages
which maybe was a little more verbose? What sort of network problems
might cause this? The network in this situation is a simple gigE
switch with about 3 or 4 systems on it. The target and initiator are
on the same subnet, nothing fancy. Is there some additional debug
you'd recommend turning on? Any tips or tricks when running with a
root iscsi drive?

Curiously, if I physically disconnect the ethernet from the initiator
while running, all I/O access is correctly paused without returning I/
O errors. If I then reconnect before the 400s is up things go back to
normal. I don't however see the detected conn error (1011) message
in this situation however. Not sure if that really means anything.

Thanks for the help

On Jul 13, 9:22 pm, Mike Christie micha...@cs.wisc.edu wrote:
 On 07/13/2010 01:41 PM, Sean S wrote:

  I'm running an iscsi root partition for a CentOS machine running a
  2.6.18-53 kernel. Every couple of days I get the error:

  connection1:0 detected conn error (1011)
  session1: session recovery timed out after 400 sec

 Is there anything more to the log? Is there anything from iscsid?
 Something about not being able to connect/reconnect to the target?

 If you just see that, then it means there was some connection problem.
 We do not know exactly what it was, but we disconnected the connection,
 then tried to reconnect. We tried to reconnect for 400 seconds but could
 not, so at that point we mark the session as bad and start to fail IO
 until we can log back in.

 It is normally due to a problem in the network if the target is ok.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2010-01-25 Thread sam.rawlins
Hi Hannes,

I am seeing similar problems.
What kernel do you mean, that has fixes?

On Jan 13, 7:24 am, Hannes Reinecke h...@suse.de wrote:
 avora wrote:
  With SLES10 SP3 x86_64,
  as soon as I start the second iscsi session2, I am very frequently
  getting the connection errors/
  I do not see this with SLES10 SP2 x86_64 on the same setup.

  Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0: detected conn error
  (1011)
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)
  Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
  2:0 error (1011) state (3)
  Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0: detected conn error
  (1011)

  I have tried changing noop_out_interval and noop_out_timeout to
  120/120 and 0/0 but did not help.
  The iscsiadm settings are same on both SP2 and SP3.
  Is there anything else that can be tried ?

  # iscsiadm --mode node --targetname target
  ...

  # rpm -qa | grep iscsi
  iscsitarget-0.4.17-3.4.25
  open-iscsi-2.0.868-0.6.11
  yast2-iscsi-client-2.14.47-0.4.9
  yast2-iscsi-server-2.13.26-0.3

 Please try with the latest update kernel. I made quite some
 fixes which should help here.

 cheers,

 Hannes
 --
 Dr. Hannes Reinecke                   zSeries  Storage
 h...@suse.de                          +49 911 74053 688
 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N rnberg
 GF: Markus Rex, HRB 16746 (AG N rnberg)

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2010-01-13 Thread Hannes Reinecke
avora wrote:
 With SLES10 SP3 x86_64,
 as soon as I start the second iscsi session2, I am very frequently
 getting the connection errors/
 I do not see this with SLES10 SP2 x86_64 on the same setup.
 
 Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0: detected conn error
 (1011)
 Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
 recovery (1 attempts)
 Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
 2:0 error (1011) state (3)
 Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0: detected conn error
 (1011)
 
 I have tried changing noop_out_interval and noop_out_timeout to
 120/120 and 0/0 but did not help.
 The iscsiadm settings are same on both SP2 and SP3.
 Is there anything else that can be tried ?
 
 # iscsiadm --mode node --targetname target
 ...
 
 # rpm -qa | grep iscsi
 iscsitarget-0.4.17-3.4.25
 open-iscsi-2.0.868-0.6.11
 yast2-iscsi-client-2.14.47-0.4.9
 yast2-iscsi-server-2.13.26-0.3
 
Please try with the latest update kernel. I made quite some
fixes which should help here.

cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-15 Thread Mike Christie
Just email the trace to me in private.

Anuarg Vora wrote:
 I have got a reproducible test case for this.
 It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error 
 (1011)' is seen.
 

DID_BUS_BUSY when getting a 1011 is sort of expected. If you are not 
using dm-multipath then the scsi layer will retry the error value up to 
5 times.

If you are using dm-mutlipath then the scsi layer will fail the IO to 
the multipath layer, where it will retry a new path right away.


 for p in `ls /dev/sd*`
 do
 dd if=$p of=/dev/zero count=1 
 done
 wait
 
 # ./io-script
 1+0 records in
 1+0 records out
 512 bytes (5.1 MB) copied, 0.177076 seconds, 28.9 MB/s
 
 dd: reading `/dev/sdaa8': Input/output error
 2976+0 records in
 2976+0 records out
 
 Dec 14 11:15:12 cdc-r710s3 iscsid: Kernel reported iSCSI connection 1:0 error 
 (1011) state (3)
 Dec 14 11:15:13 cdc-r710s3 kernel:  connection2:0: detected conn error (1011)
 Dec 14 11:15:13 cdc-r710s3 iscsid: connection2:0 is operational after 
 recovery (1 attempts)
 Dec 14 11:15:13 cdc-r710s3 iscsid: Kernel reported iSCSI connection 2:0 error 
 (1011) state (3)
 Dec 14 11:15:14 cdc-r710s3 kernel:  connection1:0: detected conn error (1011)
 ...
 Dec 14 11:15:14 cdc-r710s3 kernel: sd 9:0:0:13: SCSI error: return code = 
 0x0002  == DID_BUS_BUSY
 Dec 14 11:15:14 cdc-r710s3 kernel: end_request: I/O error, dev sdaa, sector 
 2976
 
 I am unable to upload ethereal on 
 http://groups-beta.google.com/group/open-iscsi/files
 
 Regards,
 Anurag
 
 --- On Fri, 12/11/09, Anuarg Vora anurag_vo...@yahoo.com wrote:
 
 From: Anuarg Vora anurag_vo...@yahoo.com
 Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Friday, December 11, 2009, 12:22 AM
 Sorry, I do not see an upload option
 for me even after (signing-in).
 How to upload ?

 --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu
 wrote:

 From: Mike Christie micha...@cs.wisc.edu
 Subject: Re: SLES10 SP3 x86_64 - connection2:0:
 detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Thursday, December 10, 2009, 11:39 PM
 Anuarg Vora wrote:
 I did sent the ethereal trace yesterday.
 I am not sure why it didn't reach, is there any
 place
 I can upload it ?
 http://groups-beta.google.com/group/open-iscsi/files

 --

 You received this message because you are subscribed
 to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.




   

 --

 You received this message because you are subscribed to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.



 
 
   
 
 --
 
 You received this message because you are subscribed to the Google Groups 
 open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.
 
 

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-14 Thread Anuarg Vora
I have got a reproducible test case for this.
It seems that SCSI layer returns DID_BUS_BUSY many times when 'conn error 
(1011)' is seen.

for p in `ls /dev/sd*`
do
dd if=$p of=/dev/zero count=1 
done
wait

# ./io-script
1+0 records in
1+0 records out
512 bytes (5.1 MB) copied, 0.177076 seconds, 28.9 MB/s

dd: reading `/dev/sdaa8': Input/output error
2976+0 records in
2976+0 records out

Dec 14 11:15:12 cdc-r710s3 iscsid: Kernel reported iSCSI connection 1:0 error 
(1011) state (3)
Dec 14 11:15:13 cdc-r710s3 kernel:  connection2:0: detected conn error (1011)
Dec 14 11:15:13 cdc-r710s3 iscsid: connection2:0 is operational after recovery 
(1 attempts)
Dec 14 11:15:13 cdc-r710s3 iscsid: Kernel reported iSCSI connection 2:0 error 
(1011) state (3)
Dec 14 11:15:14 cdc-r710s3 kernel:  connection1:0: detected conn error (1011)
...
Dec 14 11:15:14 cdc-r710s3 kernel: sd 9:0:0:13: SCSI error: return code = 
0x0002  == DID_BUS_BUSY
Dec 14 11:15:14 cdc-r710s3 kernel: end_request: I/O error, dev sdaa, sector 2976

I am unable to upload ethereal on 
http://groups-beta.google.com/group/open-iscsi/files

Regards,
Anurag

--- On Fri, 12/11/09, Anuarg Vora anurag_vo...@yahoo.com wrote:

 From: Anuarg Vora anurag_vo...@yahoo.com
 Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Friday, December 11, 2009, 12:22 AM
 Sorry, I do not see an upload option
 for me even after (signing-in).
 How to upload ?
 
 --- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu
 wrote:
 
  From: Mike Christie micha...@cs.wisc.edu
  Subject: Re: SLES10 SP3 x86_64 - connection2:0:
 detected conn error (1011)
  To: open-iscsi@googlegroups.com
  Date: Thursday, December 10, 2009, 11:39 PM
  Anuarg Vora wrote:
   I did sent the ethereal trace yesterday.
   I am not sure why it didn't reach, is there any
 place
  I can upload it ?
   
  
  http://groups-beta.google.com/group/open-iscsi/files
  
  --
  
  You received this message because you are subscribed
 to the
  Google Groups open-iscsi group.
  To post to this group, send email to open-is...@googlegroups.com.
  To unsubscribe from this group, send email to
 open-iscsi+unsubscr...@googlegroups.com.
  For more options, visit this group at 
  http://groups.google.com/group/open-iscsi?hl=en.
  
  
  
 
 
       
 
 --
 
 You received this message because you are subscribed to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.
 
 
 


  

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-11 Thread Anuarg Vora
Sorry, I do not see an upload option for me even after (signing-in).
How to upload ?

--- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote:

 From: Mike Christie micha...@cs.wisc.edu
 Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Thursday, December 10, 2009, 11:39 PM
 Anuarg Vora wrote:
  I did sent the ethereal trace yesterday.
  I am not sure why it didn't reach, is there any place
 I can upload it ?
  
 
 http://groups-beta.google.com/group/open-iscsi/files
 
 --
 
 You received this message because you are subscribed to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.
 
 
 


  

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread berthiaume_wayne
Is CHAP configured on the array? 

-Original Message-
From: open-iscsi@googlegroups.com [mailto:open-is...@googlegroups.com]
On Behalf Of Mike Christie
Sent: Wednesday, December 09, 2009 9:54 PM
To: open-iscsi@googlegroups.com
Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error
(1011)

avora wrote:
 I do not see ping/nop timeout message in the logs
 (probably that's why changing the noop timeouts did not work).
 Simply starting the session does not cause these errors.
 On starting the second session, I start a daemon
 that does SCSI commands like INQUIRY on all the paths.
 After that I see these messages, and the daemon gets stuck
 for a very long time waiting for SCSI commands to finish.
 
 At the backend I have EMC CLARiiON.
 
 # iscsiadm -m node -P 1
 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
 Portal: 192.168.10.1:3260,1
 Iface Name: iface0
 Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
 Portal: 192.168.12.1:3260,3
 Iface Name: iface1


Does the same path always fail?

If you log into one can you use it, then if you logout and log into the 
other does that other one then work?

Is there any info the clarrion logs?

--

You received this message because you are subscribed to the Google
Groups open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.



--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread avora
Yes Mike, the recovery message is seen right away.

Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
recovery (1 attempts)

'conn error' and 'recovery' are seen one after the other, continuosly.

On Dec 10, 8:04 am, Mike Christie micha...@cs.wisc.edu wrote:
 avora wrote:
  I got a similar issue while browsing
 http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37...

  I wanted to enable logging as mentioned in above link.
  
  echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_conn
  echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_session
  echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_eh
  echo 1  /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
  echo 1  /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
  ---

  But on my machine I only see.

  #  ls /sys/module/libiscsi/
  refcnt  sections  srcversion

  # ls /sys/module/iscsi_tcp/
  parameters  refcnt  sections  srcversion

  # ls /sys/module/iscsi_tcp/parameters/max_lun
  /sys/module/iscsi_tcp/parameters/max_lun

 Your open-iscsi version is older and does not have those settings.



  # iscsiadm -m session -P 1
  Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3
  
                  iSCSI Connection State: TRANSPORT WAIT
                  iSCSI Session State: FAILED
                  Internal iscsid Session State: REPOEN

 You might be seeing something else.
I did not get what exactly you meant

  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)

 After theconnerrormessage do you see one of these right away?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread Anuarg Vora
There is no CHAP configured on the array.


--- On Thu, 12/10/09, berthiaume_wa...@emc.com berthiaume_wa...@emc.com wrote:

 From: berthiaume_wa...@emc.com berthiaume_wa...@emc.com
 Subject: RE: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Thursday, December 10, 2009, 7:29 AM
 Is CHAP configured on the array? 
 
 -Original Message-
 From: open-iscsi@googlegroups.com
 [mailto:open-is...@googlegroups.com]
 On Behalf Of Mike Christie
 Sent: Wednesday, December 09, 2009 9:54 PM
 To: open-iscsi@googlegroups.com
 Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected
 conn error
 (1011)
 
 avora wrote:
  I do not see ping/nop timeout message in the logs
  (probably that's why changing the noop timeouts did
 not work).
  Simply starting the session does not cause these
 errors.
  On starting the second session, I start a daemon
  that does SCSI commands like INQUIRY on all the
 paths.
  After that I see these messages, and the daemon gets
 stuck
  for a very long time waiting for SCSI commands to
 finish.
  
  At the backend I have EMC CLARiiON.
  
  # iscsiadm -m node -P 1
  Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
          Portal:
 192.168.10.1:3260,1
              
    Iface Name: iface0
  Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
          Portal:
 192.168.12.1:3260,3
              
    Iface Name: iface1
 
 
 Does the same path always fail?
 
 If you log into one can you use it, then if you logout and
 log into the 
 other does that other one then work?
 
 Is there any info the clarrion logs?
 
 --
 
 You received this message because you are subscribed to the
 Google
 Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/open-iscsi?hl=en.
 
 
 
 --
 
 You received this message because you are subscribed to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.
 
 
 


  

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread Mike Christie
avora wrote:
 Yes Mike, the recovery message is seen right away.
 
 Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
 recovery (1 attempts)
 
 'conn error' and 'recovery' are seen one after the other, continuosly.
 

Do you have other initiators connected to the target?

Can you get me a wireshark trace?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread Anuarg Vora
I did sent the ethereal trace yesterday.
I am not sure why it didn't reach, is there any place I can upload it ?

There is only 1 initiator.

# iscsiadm -m session -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3
Current Portal: 192.168.11.1:3260,2
Persistent Portal: 192.168.11.1:3260,2
**
Interface:
**
Iface Name: iface0
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960
Iface IPaddress: 192.168.11.11
Iface HWaddress: 00:15:17:A8:A9:1E
Iface Netdev: eth0
SID: 10
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN
Target: iqn.1992-04.com.emc:cx.ckm00091100683.b3
Current Portal: 192.168.13.1:3260,4
Persistent Portal: 192.168.13.1:3260,4
**
Interface:
**
Iface Name: iface1
Iface Transport: tcp
Iface Initiatorname: iqn.1996-04.de.suse:02:9914ca52960
Iface IPaddress: 192.168.13.11
Iface HWaddress: 00:15:17:A8:A9:1F
Iface Netdev: eth1
SID: 11
iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN


--- On Thu, 12/10/09, Mike Christie micha...@cs.wisc.edu wrote:

 From: Mike Christie micha...@cs.wisc.edu
 Subject: Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
 To: open-iscsi@googlegroups.com
 Date: Thursday, December 10, 2009, 11:22 PM
 avora wrote:
  Yes Mike, the recovery message is seen right away.
  
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0
 is operational after
  recovery (1 attempts)
  
  'conn error' and 'recovery' are seen one after the
 other, continuosly.
  
 
 Do you have other initiators connected to the target?
 
 Can you get me a wireshark trace?
 
 --
 
 You received this message because you are subscribed to the
 Google Groups open-iscsi group.
 To post to this group, send email to open-is...@googlegroups.com.
 To unsubscribe from this group, send email to 
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/open-iscsi?hl=en.
 
 
 


  

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-10 Thread Mike Christie
Anuarg Vora wrote:
 I did sent the ethereal trace yesterday.
 I am not sure why it didn't reach, is there any place I can upload it ?
 

http://groups-beta.google.com/group/open-iscsi/files

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-08 Thread avora
I do not see ping/nop timeout message in the logs
(probably that's why changing the noop timeouts did not work).
Simply starting the session does not cause these errors.
On starting the second session, I start a daemon
that does SCSI commands like INQUIRY on all the paths.
After that I see these messages, and the daemon gets stuck
for a very long time waiting for SCSI commands to finish.

At the backend I have EMC CLARiiON.

# iscsiadm -m node -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
Portal: 192.168.10.1:3260,1
Iface Name: iface0
Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
Portal: 192.168.12.1:3260,3
Iface Name: iface1

# iscsiadm --mode node --targetname iqn.
1992-04.com.emc:cx.ckm00091100683.a2
node.name = iqn.1992-04.com.emc:cx.ckm00091100683.a2
node.tpgt = 1
node.startup = automatic
iface.hwaddress = 00:15:17:A8:A9:0A
iface.iscsi_ifacename = iface0
iface.net_ifacename = eth4
iface.transport_name = tcp
node.discovery_address = 192.168.10.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = empty
node.session.auth.password = empty
node.session.auth.username_in = empty
node.session.auth.password_in = empty
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 192.168.10.1
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None,CRC32C
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote:
 avora wrote:
  With SLES10 SP3 x86_64,
  as soon as I start the second iscsi session2, I am very frequently
  getting the connection errors/
  I do not see this with SLES10 SP2 x86_64 on the same setup.

  Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)
  Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
  2:0error(1011) state (3)
  Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)

  I have tried changing noop_out_interval and noop_out_timeout to
  120/120 and 0/0 but did not help.

 Did you see a ping/nop timeout message in the logs or just what you
 included above with theconnerror1011? The ping/nop message would be a
 little before the conerror1011.

 What target is this with and are you doing any IO tests when this
 happens or are you just logging into the second session and then you
 start to get these errors?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-08 Thread avora
I got a similar issue while browsing
http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f

I wanted to enable logging as mentioned in above link.

echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_conn
echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_session
echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_eh
echo 1  /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
echo 1  /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
---

But on my machine I only see.

#  ls /sys/module/libiscsi/
refcnt  sections  srcversion

# ls /sys/module/iscsi_tcp/
parameters  refcnt  sections  srcversion

# ls /sys/module/iscsi_tcp/parameters/max_lun
/sys/module/iscsi_tcp/parameters/max_lun


# iscsiadm -m session -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3

iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN





On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote:
 avora wrote:
  With SLES10 SP3 x86_64,
  as soon as I start the second iscsi session2, I am very frequently
  getting the connection errors/
  I do not see this with SLES10 SP2 x86_64 on the same setup.

  Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)
  Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
  2:0error(1011) state (3)
  Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)

  I have tried changing noop_out_interval and noop_out_timeout to
  120/120 and 0/0 but did not help.

 Did you see a ping/nop timeout message in the logs or just what you
 included above with theconnerror1011? The ping/nop message would be a
 little before the conerror1011.

 What target is this with and are you doing any IO tests when this
 happens or are you just logging into the second session and then you
 start to get these errors?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-07 Thread avora
With SLES10 SP3 x86_64,
as soon as I start the second iscsi session2, I am very frequently
getting the connection errors/
I do not see this with SLES10 SP2 x86_64 on the same setup.

Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0: detected conn error
(1011)
Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
recovery (1 attempts)
Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
2:0 error (1011) state (3)
Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0: detected conn error
(1011)

I have tried changing noop_out_interval and noop_out_timeout to
120/120 and 0/0 but did not help.
The iscsiadm settings are same on both SP2 and SP3.
Is there anything else that can be tried ?

# iscsiadm --mode node --targetname target
...

# rpm -qa | grep iscsi
iscsitarget-0.4.17-3.4.25
open-iscsi-2.0.868-0.6.11
yast2-iscsi-client-2.14.47-0.4.9
yast2-iscsi-server-2.13.26-0.3

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-07 Thread Mike Christie
avora wrote:
 With SLES10 SP3 x86_64,
 as soon as I start the second iscsi session2, I am very frequently
 getting the connection errors/
 I do not see this with SLES10 SP2 x86_64 on the same setup.
 
 Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0: detected conn error
 (1011)
 Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
 recovery (1 attempts)
 Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
 2:0 error (1011) state (3)
 Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0: detected conn error
 (1011)
 
 I have tried changing noop_out_interval and noop_out_timeout to
 120/120 and 0/0 but did not help.

Did you see a ping/nop timeout message in the logs or just what you 
included above with the conn error 1011? The ping/nop message would be a 
little before the con error 1011.

What target is this with and are you doing any IO tests when this 
happens or are you just logging into the second session and then you 
start to get these errors?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-19 Thread Matthew Dickinson



On 11/10/09 11:39 AM, Mike Christie micha...@cs.wisc.edu wrote:
 
 What version of open-iscsi were you using and what kernel, and were you
 using the iscsi kernel modules with open-iscsi.org tarball or from the
 kernel?

iscsi-initiator-utils-6.2.0.871-0.10.el5
kernel-2.6.18-164.2.1.el5

RedHat RPMs

 
 
 It looks like we are sending more IO than the target can handle. In one
 of those cases it took more than 30 or 60 seconds (depending on your
 timeout value).
 
 What is the value of
 
 cat /sys/block/sdXYZ/device/timeout
 
 ?
 
 If it is 30 or 60 could you increase it to 360? After you login to the
 target do
 
 echo 360  /sys/block/sdXYZ/device/timeout

I've tried setting this, but it appears to have no effect - it was 60, and I
increased to 360.

 
 And what is the value of:
 
 iscsiadm -m node -T your_target | grep node.session.cmds_max
 
 If that is 128, then could you decrease that to 32 or 16?
 
 Run
 
 iscsiadm -m node -T your_target -u
 iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32
 iscsiad-m node -T your_target -l

I've tried setting to both 16 and 32, but it behaves about the same.

 
 
 And if those prevent the io errors then could you do
 
 echo noop  /sys/block/sdXYZ/queue/scheduler
 
 to see if performance increases with a difference scheduler.


I really think I'm back to the duplicate ACK problem - see the attached
packet dump - at one point  there's 30 duplicate ACKs... Interestingly, the
storage has worked for the past week - I'm using it as  D2D backup.  This
morning (about 7 days later), it's giving all these duplicate ACKs.

I'm currently running into messages such as:

Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error
(1011) state (3)
Nov 19 09:47:00 backup kernel:  session2: target reset succeeded
Nov 19 09:47:01 backup iscsid: connection2:0 is operational after recovery
(1 attempts)
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 8856
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:80.
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 74424
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
8845240
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:192.
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
62915456
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: timing out command, waited 300s
Nov 19 09:47:10 backup multipathd: /sbin/mpath_prio_alua exitted with 1
Nov 19 09:47:10 backup multipathd: error calling out /sbin/mpath_prio_alua
/dev/sdm 
Nov 19 09:47:10 backup multipathd: 3600d0230061d4479bfb83902: switch
to path group #2 

This is also interesting:

Nov 18 01:48:30 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 8
Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 7
Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 6
Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 5
Nov 18 20:16:29 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 4
Nov 18 20:16:34 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 5
Nov 18 20:32:09 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 6
Nov 18 20:43:05 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 7
Nov 18 20:48:08 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 8
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 7
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 6
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 5
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 4
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 3
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 2
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 1
Nov 18 20:53:36 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 0
Nov 18 20:53:41 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 1
Nov 18 20:59:09 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 2
Nov 18 21:04:37 backup multipathd: 3600d0230061d4479bfb83902:
remaining active paths: 3
Nov 18 21:10:05 backup multipathd: 

Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-19 Thread Mike Christie
Matthew Dickinson wrote:
 
 
 On 11/10/09 11:39 AM, Mike Christie micha...@cs.wisc.edu wrote:
 What version of open-iscsi were you using and what kernel, and were you
 using the iscsi kernel modules with open-iscsi.org tarball or from the
 kernel?
 
 iscsi-initiator-utils-6.2.0.871-0.10.el5
 kernel-2.6.18-164.2.1.el5
 
 RedHat RPMs
 

 It looks like we are sending more IO than the target can handle. In one
 of those cases it took more than 30 or 60 seconds (depending on your
 timeout value).

 What is the value of

 cat /sys/block/sdXYZ/device/timeout

 ?

 If it is 30 or 60 could you increase it to 360? After you login to the
 target do

 echo 360  /sys/block/sdXYZ/device/timeout
 
 I've tried setting this, but it appears to have no effect - it was 60, and I
 increased to 360.
 
 And what is the value of:

 iscsiadm -m node -T your_target | grep node.session.cmds_max

 If that is 128, then could you decrease that to 32 or 16?

 Run

 iscsiadm -m node -T your_target -u
 iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32
 iscsiad-m node -T your_target -l
 
 I've tried setting to both 16 and 32, but it behaves about the same.
 

 And if those prevent the io errors then could you do

 echo noop  /sys/block/sdXYZ/queue/scheduler

 to see if performance increases with a difference scheduler.
 
 
 I really think I'm back to the duplicate ACK problem - see the attached
 packet dump - at one point  there's 30 duplicate ACKs... Interestingly, the

I did not get the attachement.

 storage has worked for the past week - I'm using it as  D2D backup.  This
 morning (about 7 days later), it's giving all these duplicate ACKs.
 
 I'm currently running into messages such as:
 
 Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error
 (1011) state (3)
 Nov 19 09:47:00 backup kernel:  session2: target reset succeeded

If you are using Red Hat RPMs, make a red hat bugzilla 
https://bugzilla.redhat.com/. CC mchri...@redhat.com on the bugzilla or 
email me at that address when you have made the bugzilla. I will then 
add some network people to it. Attach your trace to the bugzilla.

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.




Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-12 Thread Hoot, Joseph

sorry... wrong information.  Here is the correct information.  I was doing some 
testing in VMWare Fusion VM's for a presentation that I'm giving.  The 
storage server is CentOS 5.3, which dishes out IETD targets for my OVM 
servers.  The OVM 2.2 environment is as follows:

[r...@ovm1 ~]# uname -r
2.6.18-128.2.1.4.9.el5xen
[r...@ovm1 ~]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.871-0.7.el5
[r...@ovm1 ~]# 



On Nov 10, 2009, at 2:30 PM, Mike Christie wrote:

 
 Hoot, Joseph wrote:
 [r...@storage ~]# uname -r
 2.6.18-164.el5
 [r...@storage ~]# rpm -qa | grep iscsi
 iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
 [r...@storage ~]#
 
 
 Weird.
 
 Is 2.6.18-164.el5 the kernel being used in the virtual machine/DonU? Is 
 that where you are using iscsi? It looks like the Oracle enterprise 
 linux kernel is 2.6.18-164.el5, which looks like it is based on RHEL 
 5.4. The iscsi code in there is the same as RHEL/upstream. No sendwait 
 patch.
 
 However, it looks like there is a 2.6.18-128.2.1.4.9 kernel (comes with 
 the Oracle VM rpms). In here we have a different iscsi version. It looks 
 a little older than what is in 2.6.18-164.el5, but it has the sendwait 
 patch I send to dell. Do you use this kernel in the Dom0? Are you using 
 this kernel with iscsi?
 
 
 
 On Nov 10, 2009, at 12:17 PM, Mike Christie wrote:
 
 Hoot, Joseph wrote:
 I've had about 3 threads of dt (kicking off a bit randomly) on (3) 
 separate volumes for over a week and haven't had a single disconnect yet.  
  I am currently using whatever rpm is distributed with Oracle VM v2.2.  I 
 know for sure that they have included the 871 base, plus I believe at 
 least a one off patch.  I can get more details if you'd like.
 
 But so far so good for now
 
 I think I have the source they are using. Could you do a uname -r, so I 
 can see what kernel they are using.
 
 
 ===
 Joseph R. Hoot
 Lead System Programmer/Analyst
 (w) 716-878-4832
 (c) 716-759-HOOT
 joe.h...@itec.suny.edu
 GPG KEY:   7145F633
 ===
 
 
 
 
 
  

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-10 Thread Mike Christie

Hoot, Joseph wrote:
 I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate 
 volumes for over a week and haven't had a single disconnect yet.   I am 
 currently using whatever rpm is distributed with Oracle VM v2.2.  I know for 
 sure that they have included the 871 base, plus I believe at least a one off 
 patch.  I can get more details if you'd like.
 
 But so far so good for now
 

I think I have the source they are using. Could you do a uname -r, so I 
can see what kernel they are using.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-10 Thread Hoot, Joseph

[r...@storage ~]# uname -r
2.6.18-164.el5
[r...@storage ~]# rpm -qa | grep iscsi
iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
[r...@storage ~]#

On Nov 10, 2009, at 12:17 PM, Mike Christie wrote:

 
 Hoot, Joseph wrote:
 I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate 
 volumes for over a week and haven't had a single disconnect yet.   I am 
 currently using whatever rpm is distributed with Oracle VM v2.2.  I know for 
 sure that they have included the 871 base, plus I believe at least a one off 
 patch.  I can get more details if you'd like.
 
 But so far so good for now
 
 
 I think I have the source they are using. Could you do a uname -r, so I 
 can see what kernel they are using.
 
  

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-10 Thread Mike Christie

Matthew Dickinson wrote:
 On 11/6/09 3:39 PM, Matthew Dickinson matt-openis...@alpha345.com wrote:
 

 Try disabling nops by setting

 node.conn[0].timeo.noop_out_interval = 0
 node.conn[0].timeo.noop_out_timeout = 0
 
 I'm still getting errors:
 
 Nov 10 09:08:04 backup kernel:  connection12:0: detected conn error (1011)
 Nov 10 09:08:05 backup iscsid: Kernel reported iSCSI connection 12:0 error
 (1011) state (3)
 Nov 10 09:08:08 backup iscsid: connection12:0 is operational after recovery
 (1 attempts)
 Nov 10 09:09:43 backup kernel:  connection11:0: detected conn error (1011)
 Nov 10 09:09:43 backup kernel:  connection12:0: detected conn error (1011)
 Nov 10 09:09:44 backup kernel:  connection11:0: detected conn error (1011)
 Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 11:0 error
 (1011) state (3)
 Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 12:0 error
 (1011) state (3)
 Nov 10 09:09:44 backup iscsid: Kernel reported iSCSI connection 11:0 error
 (1011) state (1)
 Nov 10 09:09:46 backup kernel:  session11: target reset succeeded\
 Nov 10 09:09:47 backup iscsid: connection11:0 is operational after recovery
 (1 attempts)
 Nov 10 09:09:47 backup iscsid: connection12:0 is operational after recovery
 (1 attempts)
 Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code =
 0x000e
 Nov 10 09:09:56 backup kernel: end_request: I/O error, dev sdv, sector
 60721248
 Nov 10 09:09:56 backup kernel: device-mapper: multipath: Failing path 65:80.
 Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code =
 0x000e
 Nov 10 09:09:56 backup kernel: end_request: I/O error, dev sdv, sector
 60727648
 Nov 10 09:09:56 backup kernel: sd 18:0:0:2: SCSI error: return code =
 0x000e
 Nov 10 09:10:31 backup kernel: device-mapper: multipath: Failing path
 65:112.
 
 Interestingly, I  tried a Windows 2008 server R2 talking over a single
 connection to the storage unit,  configured to access just via one
 interface, I was able to sustain 20MB/s ­ so it would ³appear² to be a
 Linux-related issue - I'm only able to get 9MB/s out of Linux even when
 using 8 interfaces on both controllers.
 

What version of open-iscsi were you using and what kernel, and were you 
using the iscsi kernel modules with open-iscsi.org tarball or from the 
kernel?


It looks like we are sending more IO than the target can handle. In one 
of those cases it took more than 30 or 60 seconds (depending on your 
timeout value).

What is the value of

cat /sys/block/sdXYZ/device/timeout

?

If it is 30 or 60 could you increase it to 360? After you login to the 
target do

echo 360  /sys/block/sdXYZ/device/timeout

And what is the value of:

iscsiadm -m node -T your_target | grep node.session.cmds_max

If that is 128, then could you decrease that to 32 or 16?

Run

iscsiadm -m node -T your_target -u
iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32
iscsiad-m node -T your_target -l


And if those prevent the io errors then could you do

echo noop  /sys/block/sdXYZ/queue/scheduler

to see if performance increases with a difference scheduler.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-10 Thread Mike Christie

Hoot, Joseph wrote:
 [r...@storage ~]# uname -r
 2.6.18-164.el5
 [r...@storage ~]# rpm -qa | grep iscsi
 iscsi-initiator-utils-6.2.0.868-0.18.el5_3.1
 [r...@storage ~]#
 

Weird.

Is 2.6.18-164.el5 the kernel being used in the virtual machine/DonU? Is 
that where you are using iscsi? It looks like the Oracle enterprise 
linux kernel is 2.6.18-164.el5, which looks like it is based on RHEL 
5.4. The iscsi code in there is the same as RHEL/upstream. No sendwait 
patch.

However, it looks like there is a 2.6.18-128.2.1.4.9 kernel (comes with 
the Oracle VM rpms). In here we have a different iscsi version. It looks 
a little older than what is in 2.6.18-164.el5, but it has the sendwait 
patch I send to dell. Do you use this kernel in the Dom0? Are you using 
this kernel with iscsi?



 On Nov 10, 2009, at 12:17 PM, Mike Christie wrote:
 
 Hoot, Joseph wrote:
 I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate 
 volumes for over a week and haven't had a single disconnect yet.   I am 
 currently using whatever rpm is distributed with Oracle VM v2.2.  I know 
 for sure that they have included the 871 base, plus I believe at least a 
 one off patch.  I can get more details if you'd like.

 But so far so good for now

 I think I have the source they are using. Could you do a uname -r, so I 
 can see what kernel they are using.

 
 ===
 Joseph R. Hoot
 Lead System Programmer/Analyst
 (w) 716-878-4832
 (c) 716-759-HOOT
 joe.h...@itec.suny.edu
 GPG KEY:   7145F633
 ===
 
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Santi Saez

El 06/11/09 14:10, mdaitc escribió:

Hi mdaitc,

 I’m seeing similar TCP “weirdness” as the other posts mention as  well
 as the below errors.

(..)

 Nov  2 08:15:14 backup kernel:  connection33:0: detected conn error
 The performance isn’t what I’d expect:

(..)

What happens if you disable TCP window scaling option in RHEL servers?

# echo 0  /proc/sys/net/ipv4/tcp_window_scaling

In our case, iSCSI conn errors stopped after disabling, but still have 
a lot of TCP “weirdness” in the network, mainly dup ACKs packages.

Regards,

-- 
Santi Saez
http://woop.es

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Hoot, Joseph

What version of OiS are you using?  I had lots of weirdness and the  
same types of disconnects to our Dell EqualLogic when we were  
(actually still are in production) using 868 code.  I'm now using open- 
iscsi-871 code plus a sendwait patch and haven' had the issue.  I've  
now been slamming my storage for a week and a half with multiple  
threads of dt.


On Nov 9, 2009, at 4:33 AM, Santi Saez wrote:


 El 06/11/09 14:10, mdaitc escribió:

 Hi mdaitc,

 I’m seeing similar TCP “weirdness” as the other posts mention as   
 well
 as the below errors.

 (..)

 Nov  2 08:15:14 backup kernel:  connection33:0: detected conn error
 The performance isn’t what I’d expect:

 (..)

 What happens if you disable TCP window scaling option in RHEL servers?

 # echo 0  /proc/sys/net/ipv4/tcp_window_scaling

 In our case, iSCSI conn errors stopped after disabling, but still  
 have
 a lot of TCP “weirdness” in the network, mainly dup ACKs packages.

 Regards,

 -- 
 Santi Saez
 http://woop.es

 

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Gopu Krishnan
Hi all,

I am working on iSCSI En. Tar. Could you please someone explain about the
performance of the IET.
If so how the performance was calculated and what was the througput for the
same.

Thanks
Gopala krishnan Varatharajan

On Sat, Nov 7, 2009 at 3:09 AM, Matthew Dickinson 
matt-openis...@alpha345.com wrote:


 On 11/6/09 3:08 PM, Mike Christie micha...@cs.wisc.edu wrote:

 
  Could you send more of the log? Do you see a message like
connection1:0 is
  operational after recovery (1 attempts)
  after you see the conn errors (how many attempts)?

 Here's one particular connection:

 Nov  4 05:12:14 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4321648393, last ping 4321653393, now
 4321658393
 Nov  4 05:12:14 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 05:12:21 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)
 Nov  4 05:12:46 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4321680691, last ping 4321685691, now
 4321690691
 Nov  4 05:12:46 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 05:12:58 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)
 Nov  4 07:46:03 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4330877890, last ping 4330882890, now
 4330887890
 Nov  4 07:46:03 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 07:46:10 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)
 Nov  4 07:46:27 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4330901733, last ping 4330906733, now
 4330911733
 Nov  4 07:46:27 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 07:46:32 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)
 Nov  4 07:47:21 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4330955414, last ping 4330960414, now
 4330965414
 Nov  4 07:47:21 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 07:47:28 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)
 Nov  4 07:48:28 backup kernel:  connection22:0: ping timeout of 5 secs
 expired, recv timeout 5, last rx 4331023213, last ping 4331028213, now
 4331033213
 Nov  4 07:48:28 backup kernel:  connection22:0: detected conn error (1011)
 Nov  4 07:48:35 backup iscsid: connection22:0 is operational after recovery
 (1 attempts)

 FWIW:

 [r...@backup ~]# cat /var/log/messages | grep after recovery | awk
 '{print
 $11 $12}' | sort  | uniq
 (113 attempts)
 (1 attempts)
 (24 attempts)
 (2 attempts)
 (3 attempts)
 (4 attempts)
 (5 attempts)
 (66 attempts)
 (68 attempts)
 (6 attempts)
 (7 attempts)
 (8 attempts)
 (9 attempts)

 
  Try disabling nops by setting
 
  node.conn[0].timeo.noop_out_interval = 0
  node.conn[0].timeo.noop_out_timeout = 0

 Ok, I'll let you know how it pans out.

 Thanks,

 Matthew



 



--

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Mike Christie

Hoot, Joseph wrote:
 What version of OiS are you using?  I had lots of weirdness and the  
 same types of disconnects to our Dell EqualLogic when we were  
 (actually still are in production) using 868 code.  I'm now using open- 
 iscsi-871 code plus a sendwait patch and haven' had the issue.  I've  

What is the sendwait patch? Is it a patch for open-iscsi or to the 
kernel network code?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Hoot, Joseph

it was for OiS 871 code prior to RHEL 5.4 release (not sure if the  
release include it or not).  I'm not sure who came up with it.  I was  
working with Don Williams from Dell EqualLogic.  He got ahold of it  
somehow.  I applied it and it seemed to improve things.


On Nov 9, 2009, at 2:31 PM, Mike Christie wrote:


 Hoot, Joseph wrote:
 What version of OiS are you using?  I had lots of weirdness and the
 same types of disconnects to our Dell EqualLogic when we were
 (actually still are in production) using 868 code.  I'm now using  
 open-
 iscsi-871 code plus a sendwait patch and haven' had the issue.  I've

 What is the sendwait patch? Is it a patch for open-iscsi or to the
 kernel network code?

 

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Mike Christie

Hoot, Joseph wrote:
 it was for OiS 871 code prior to RHEL 5.4 release (not sure if the  
 release include it or not).  I'm not sure who came up with it.  I was  
 working with Don Williams from Dell EqualLogic.  He got ahold of it  
 somehow.  I applied it and it seemed to improve things.
 

Ah ok. I think it was the patch I sent to Don.

If you just used 871 without the patch (or what is in the stock RHEL 5.4 
kernel) does it work ok? There were a couple changes from 868 to 871 
that I thought would also fix the problem, so I was waiting for Don and 
them to retest just 871 and get back to me.

 
 On Nov 9, 2009, at 2:31 PM, Mike Christie wrote:
 
 Hoot, Joseph wrote:
 What version of OiS are you using?  I had lots of weirdness and the
 same types of disconnects to our Dell EqualLogic when we were
 (actually still are in production) using 868 code.  I'm now using  
 open-
 iscsi-871 code plus a sendwait patch and haven' had the issue.  I've
 What is the sendwait patch? Is it a patch for open-iscsi or to the
 kernel network code?

 
 ===
 Joseph R. Hoot
 Lead System Programmer/Analyst
 (w) 716-878-4832
 (c) 716-759-HOOT
 joe.h...@itec.suny.edu
 GPG KEY:   7145F633
 ===
 
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-09 Thread Hoot, Joseph

I've had about 3 threads of dt (kicking off a bit randomly) on (3) separate 
volumes for over a week and haven't had a single disconnect yet.   I am 
currently using whatever rpm is distributed with Oracle VM v2.2.  I know for 
sure that they have included the 871 base, plus I believe at least a one off 
patch.  I can get more details if you'd like.

But so far so good for now

 

On Nov 9, 2009, at 6:18 PM, Mike Christie wrote:

 
 Hoot, Joseph wrote:
 it was for OiS 871 code prior to RHEL 5.4 release (not sure if the  
 release include it or not).  I'm not sure who came up with it.  I was  
 working with Don Williams from Dell EqualLogic.  He got ahold of it  
 somehow.  I applied it and it seemed to improve things.
 
 
 Ah ok. I think it was the patch I sent to Don.
 
 If you just used 871 without the patch (or what is in the stock RHEL 5.4 
 kernel) does it work ok? There were a couple changes from 868 to 871 
 that I thought would also fix the problem, so I was waiting for Don and 
 them to retest just 871 and get back to me.
 
 
 On Nov 9, 2009, at 2:31 PM, Mike Christie wrote:
 
 Hoot, Joseph wrote:
 What version of OiS are you using?  I had lots of weirdness and the
 same types of disconnects to our Dell EqualLogic when we were
 (actually still are in production) using 868 code.  I'm now using  
 open-
 iscsi-871 code plus a sendwait patch and haven' had the issue.  I've
 What is the sendwait patch? Is it a patch for open-iscsi or to the
 kernel network code?
 
 
 ===
 Joseph R. Hoot
 Lead System Programmer/Analyst
 (w) 716-878-4832
 (c) 716-759-HOOT
 joe.h...@itec.suny.edu
 GPG KEY:   7145F633
 ===
 
 
 
 
 
  

===
Joseph R. Hoot
Lead System Programmer/Analyst
(w) 716-878-4832
(c) 716-759-HOOT
joe.h...@itec.suny.edu
GPG KEY:   7145F633
===


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-04 Thread Santi Saez

El 03/11/09 0:52, Mike Christie escribió:

Dear Mike,

 You can turn off ping/nops by setting

 node.conn[0].timeo.noop_out_interval = 0
 node.conn[0].timeo.noop_out_timeout = 0

 (set that in iscsid.conf then rediscovery the target or run iscsiadm -m
 node -T your_target -o update -n name_of_param_above -v 0

Thanks!! As I said to James in the previous email, disabling TCP window 
scaling *solves partially* this problem, we still hold nop pings in the 
configuration. But still have too many TCP Dup ACKs in the network :-S


 This might just work around. What might happen is that you will not see
 the nop/ping and conn errors and instead would just see a slow down in
 the workloads being run.

I have sent your contact to Infortrend developers, a engineer will 
contact you, thanks!

Regards,

-- 
Santi Saez
http://woop.es

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-02 Thread Santi Saez


Hi,

Randomly we get Open-iSCSI conn errors when connecting to an  
Infortrend A16E-G2130-4 storage array. We had discussed about this  
earlier in the list, see:

  http://tr.im/DVQm
  http://tr.im/DVQp

Open-iSCSI logs this:

===
Nov  2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx  
408250499, last ping 408249467, now 408254467
Nov  2 18:34:02 vz-17 kernel:  connection1:0: iscsi: detected conn  
error (1011)
Nov  2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
error (1011) state (3)
Nov  2 18:34:07 vz-17 iscsid: connection1:0 is operational after  
recovery (1 attempts)
Nov  2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx  
408294833, last ping 408299833, now 408304833
Nov  2 18:34:52 vz-17 kernel:  connection1:0: iscsi: detected conn  
error (1011)
Nov  2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
error (1011) state (3)
Nov  2 18:34:57 vz-17 iscsid: connection1:0 is operational after  
recovery (1 attempts)
===

Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5;  
I think it's not a Open-iSCSI bug as Mike suggested at:

http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f

I have only this error when connecting to Infortrend storage, and not  
with NetApp, Nexsan, etc. *connected in the same SAN*.

Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost  
segment, etc. and iSCSI session finally ends in timeout, see a  
screenshot here:

http://tinyurl.com/ykpvckn

Using Wireshark IO graphs I get this strange report about TCP/IP errors:

http://tinyurl.com/ybm4m8x

And this is another report in the same SAN connecting to a NetApp:

http://tinyurl.com/ycgc8ul

Those TCP/IP errors only occurs when connecting to Infortrend  
storage.. and no with other targets in the same SAN (using same switch  
infrastructure); is there anyway to deal with this using Open-iSCSI?  
As I see in Internet, there're a lot of Infortrend's users suffering  
this behavior.

Thanks!

P.D: speed and duplex configuration is correct in all point, there  
aren't CRC errors in the switch.

-- 
Santi Saez
http://woop.es

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-02 Thread Mike Christie

Santi Saez wrote:
 
 Hi,
 
 Randomly we get Open-iSCSI conn errors when connecting to an  
 Infortrend A16E-G2130-4 storage array. We had discussed about this  
 earlier in the list, see:
 
   http://tr.im/DVQm
   http://tr.im/DVQp
 
 Open-iSCSI logs this:
 
 ===
 Nov  2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx  
 408250499, last ping 408249467, now 408254467
 Nov  2 18:34:02 vz-17 kernel:  connection1:0: iscsi: detected conn  
 error (1011)
 Nov  2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
 error (1011) state (3)
 Nov  2 18:34:07 vz-17 iscsid: connection1:0 is operational after  
 recovery (1 attempts)
 Nov  2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx  
 408294833, last ping 408299833, now 408304833
 Nov  2 18:34:52 vz-17 kernel:  connection1:0: iscsi: detected conn  
 error (1011)
 Nov  2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
 error (1011) state (3)
 Nov  2 18:34:57 vz-17 iscsid: connection1:0 is operational after  
 recovery (1 attempts)
 ===
 
 Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5;  
 I think it's not a Open-iSCSI bug as Mike suggested at:
 
 http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f
 
 I have only this error when connecting to Infortrend storage, and not  
 with NetApp, Nexsan, etc. *connected in the same SAN*.
 
 Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost  
 segment, etc. and iSCSI session finally ends in timeout, see a  
 screenshot here:
 
 http://tinyurl.com/ykpvckn
 
 Using Wireshark IO graphs I get this strange report about TCP/IP errors:
 
 http://tinyurl.com/ybm4m8x
 
 And this is another report in the same SAN connecting to a NetApp:
 
 http://tinyurl.com/ycgc8ul
 
 Those TCP/IP errors only occurs when connecting to Infortrend  
 storage.. and no with other targets in the same SAN (using same switch  
 infrastructure); is there anyway to deal with this using Open-iSCSI?  
 As I see in Internet, there're a lot of Infortrend's users suffering  
 this behavior.
 

You can turn off ping/nops by setting

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

(set that in iscsid.conf then rediscovery the target or run iscsiadm -m 
node -T your_target -o update -n name_of_param_above -v 0

Or you might want to set them higher.

This might just work around. What might happen is that you will not see 
the nop/ping and conn errors and instead would just see a slow down in 
the workloads being run.

If you guys can get a hold of any infrotrend people let me know, because 
I would be happy to work with them on this.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Infortrend + iSCSI: detected conn error (1011) + TCP Dup ACK

2009-11-02 Thread Ulrich Windl
  MAC Rcvd Multicast Frames178236
  MAC Rcvd Broadcast Frames34

  iSCSI Shared Statistics
  ---
  PDUs Xmited  324033312
  Data Bytes Xmited29836097572
  PDUs Rcvd198783508
  Data Bytes Rcvd  35739418624
  I/O Completed165710975
  Unexpected I/O Rcvd  0
  iSCSI Format Errors  0
  Header Digest Errors 0
  Data Digest Errors   0
  Sequence Errors  0
  IP Xmit Packets  242949995
  IP Xmit Byte Count   47161789220
  IP Xmit Fragments0
  IP Rcvd Packets  312354406
  IP Rcvd Byte Count   371426357904
  IP Rcvd Fragments0
  IP Datagram Reassembly Count 0
  IP Error Packets 0
  IP Fragment Rcvd Overlap 0
  IP Fragment Rcvd Out of Order0
  IP Datagram Reassembly Timeouts  0
  TCP Xmit Segment Count   242949995
  TCP Xmit Byte Count  38654705673
  TCP Rcvd Segment Count   312354406
  TCP Rcvd Byte Count  361430272728
  TCP Persist Timer Expirations0
  TCP Rxmit Timer Expired  0
  TCP Rcvd Duplicate Acks  644
  TCP Rcvd Pure Acks   4091830
  TCP Xmit Delayed Acks13648891
  TCP Xmit Pure Acks   31445514
  TCP Rcvd Segment Errors  101
  TCP Rcvd Segment Out of Order306
  TCP Rcvd Window Probes   0
  TCP Rcvd Window Updates  0
  TCP ECC Error Corections 0


Regards,
Ulrich


On 2 Nov 2009 at 19:16, Santi Saez wrote:

 
 
 Hi,
 
 Randomly we get Open-iSCSI conn errors when connecting to an  
 Infortrend A16E-G2130-4 storage array. We had discussed about this  
 earlier in the list, see:
 
   http://tr.im/DVQm
   http://tr.im/DVQp
 
 Open-iSCSI logs this:
 
 ===
 Nov  2 18:34:02 vz-17 kernel: ping timeout of 5 secs expired, last rx  
 408250499, last ping 408249467, now 408254467
 Nov  2 18:34:02 vz-17 kernel:  connection1:0: iscsi: detected conn  
 error (1011)
 Nov  2 18:34:03 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
 error (1011) state (3)
 Nov  2 18:34:07 vz-17 iscsid: connection1:0 is operational after  
 recovery (1 attempts)
 Nov  2 18:34:52 vz-17 kernel: ping timeout of 5 secs expired, last rx  
 408294833, last ping 408299833, now 408304833
 Nov  2 18:34:52 vz-17 kernel:  connection1:0: iscsi: detected conn  
 error (1011)
 Nov  2 18:34:53 vz-17 iscsid: Kernel reported iSCSI connection 1:0  
 error (1011) state (3)
 Nov  2 18:34:57 vz-17 iscsid: connection1:0 is operational after  
 recovery (1 attempts)
 ===
 
 Running on CentOS 5.4 with iscsi-initiator-utils-6.2.0.871-0.10.el5;  
 I think it's not a Open-iSCSI bug as Mike suggested at:
 
 http://groups.google.com/group/open-iscsi/msg/fe37156096b2955f
 
 I have only this error when connecting to Infortrend storage, and not  
 with NetApp, Nexsan, etc. *connected in the same SAN*.
 
 Using Wireshark I see a lot of TCP Dup ACK, TCP ACKed lost  
 segment, etc. and iSCSI session finally ends in timeout, see a  
 screenshot here:
 
 http://tinyurl.com/ykpvckn
 
 Using Wireshark IO graphs I get this strange report about TCP/IP errors:
 
 http://tinyurl.com/ybm4m8x
 
 And this is another report in the same SAN connecting to a NetApp:
 
 http://tinyurl.com/ycgc8ul
 
 Those TCP/IP errors only occurs when connecting to Infortrend  
 storage.. and no with other targets in the same SAN (using same switch  
 infrastructure); is there anyway to deal with this using Open-iSCSI?  
 As I see in Internet, there're a lot of Infortrend's users suffering  
 this behavior.
 
 Thanks!
 
 P.D: speed and duplex configuration is correct in all point, there  
 aren't CRC errors in the switch.
 
 -- 
 Santi Saez
 http://woop.es
 
  



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread hissing_sid

Hi,

I am running an initiator on FC9 64-bit (Linux
2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
over 1GB ethernet with iSCSI.

I am running open-iscsi 2.0-870.

When I try to connect I get a conn error (1011)  and I am struggling
to know where to go next. The NexSan has no errors. Can someone give
me pointer on where to go next to try and get this working. I have
another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
that works just fine.
Any help please?

#iscsiadm -m session
tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
028a2347

Loading iSCSI transport class v2.0-870.
iscsi: registered transport (tcp)
iscsi: registered transport (iser)
scsi3 : iSCSI Initiator over TCP/IP
 connection1:0: detected conn error (1011)
 session1: host reset succeeded
 connection1:0: detected conn error (1011)
 session1: host reset succeeded
scsi 3:0:0:0: Device offlined - not ready after error recovery

#iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
10.52.145.121
node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
node.tpgt = 2
node.startup = automatic
iface.hwaddress = default
iface.iscsi_ifacename = default
iface.net_ifacename = default
iface.transport_name = tcp
iface.initiatorname = empty
node.discovery_address = 10.52.145.121
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = empty
node.session.auth.password = empty
node.session.auth.username_in = empty
node.session.auth.password_in = empty
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 10.52.145.121
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 10
node.conn[0].timeo.noop_out_timeout = 15
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread Mike Christie

hissing_sid wrote:
 Hi,
 
 I am running an initiator on FC9 64-bit (Linux
 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
 over 1GB ethernet with iSCSI.
 
 I am running open-iscsi 2.0-870.

Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one?

 
 When I try to connect I get a conn error (1011)  and I am struggling
 to know where to go next. The NexSan has no errors. Can someone give
 me pointer on where to go next to try and get this working. I have
 another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
 that works just fine.
 Any help please?
 
 #iscsiadm -m session
 tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
 028a2347
 
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi3 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded


Looks like maybe the initial inquiry or repport luns that the scsi layer 
sends is timing out. The iscsi layer probably tries to abort the command 
and that fails so we try to drop the session (conn error 1011) then 
re-login. It looks like we log in at the iscsi level ok.

I am not sure why this would happen. Let me do some digging. I think 
this might have come up before, but I did not see it.


  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 3:0:0:0: Device offlined - not ready after error recovery
 
 #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
 10.52.145.121
 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
 node.tpgt = 2
 node.startup = automatic
 iface.hwaddress = default
 iface.iscsi_ifacename = default
 iface.net_ifacename = default
 iface.transport_name = tcp
 iface.initiatorname = empty
 node.discovery_address = 10.52.145.121
 node.discovery_port = 3260
 node.discovery_type = send_targets
 node.session.initial_cmdsn = 0
 node.session.initial_login_retry_max = 4
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.auth.authmethod = None
 node.session.auth.username = empty
 node.session.auth.password = empty
 node.session.auth.username_in = empty
 node.session.auth.password_in = empty
 node.session.timeo.replacement_timeout = 120
 node.session.err_timeo.abort_timeout = 15
 node.session.err_timeo.lu_reset_timeout = 30
 node.session.err_timeo.host_reset_timeout = 60
 node.session.iscsi.FastAbort = Yes
 node.session.iscsi.InitialR2T = No
 node.session.iscsi.ImmediateData = Yes
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.session.iscsi.DefaultTime2Retain = 0
 node.session.iscsi.DefaultTime2Wait = 2
 node.session.iscsi.MaxConnections = 1
 node.session.iscsi.MaxOutstandingR2T = 1
 node.session.iscsi.ERL = 0
 node.conn[0].address = 10.52.145.121
 node.conn[0].port = 3260
 node.conn[0].startup = manual
 node.conn[0].tcp.window_size = 524288
 node.conn[0].tcp.type_of_service = 0
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.noop_out_interval = 10
 node.conn[0].timeo.noop_out_timeout = 15
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
 node.conn[0].iscsi.HeaderDigest = None
 node.conn[0].iscsi.DataDigest = None
 node.conn[0].iscsi.IFMarker = No
 node.conn[0].iscsi.OFMarker = No
 
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread hissing_sid


I downloaded and compiled the open-iscsi release, so I think I am
using that.

However, just to be sure I uninstalled iscsi-initiator-utils (yum
remove ) so I only have open-scsi.

Now, when I try to start iscsid ( ./iscsid -f ) I get the following:
iscsid: Missing or Invalid version from /sys/module/
scsi_transport_iscsi/version. Make sure a up to date
scsi_transport_iscsi module is loaded and a up todate version of
iscsid is running. Exiting...

I will investigate that. Any hints?


On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:
 hissing_sid wrote:
  Hi,

  I am running an initiator on FC9 64-bit (Linux
  2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
  over 1GB ethernet with iSCSI.

  I am running open-iscsi 2.0-870.

 Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one?





  When I try to connect I get a conn error (1011)  and I am struggling
  to know where to go next. The NexSan has no errors. Can someone give
  me pointer on where to go next to try and get this working. I have
  another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
  that works just fine.
  Any help please?

  #iscsiadm -m session
  tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
  028a2347

  Loading iSCSI transport class v2.0-870.
  iscsi: registered transport (tcp)
  iscsi: registered transport (iser)
  scsi3 : iSCSI Initiator over TCP/IP
   connection1:0: detected conn error (1011)
   session1: host reset succeeded

 Looks like maybe the initial inquiry or repport luns that the scsi layer
 sends is timing out. The iscsi layer probably tries to abort the command
 and that fails so we try to drop the session (conn error 1011) then
 re-login. It looks like we log in at the iscsi level ok.

 I am not sure why this would happen. Let me do some digging. I think
 this might have come up before, but I did not see it.

   connection1:0: detected conn error (1011)
   session1: host reset succeeded
  scsi 3:0:0:0: Device offlined - not ready after error recovery

  #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
  10.52.145.121
  node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
  node.tpgt = 2
  node.startup = automatic
  iface.hwaddress = default
  iface.iscsi_ifacename = default
  iface.net_ifacename = default
  iface.transport_name = tcp
  iface.initiatorname = empty
  node.discovery_address = 10.52.145.121
  node.discovery_port = 3260
  node.discovery_type = send_targets
  node.session.initial_cmdsn = 0
  node.session.initial_login_retry_max = 4
  node.session.cmds_max = 128
  node.session.queue_depth = 32
  node.session.auth.authmethod = None
  node.session.auth.username = empty
  node.session.auth.password = empty
  node.session.auth.username_in = empty
  node.session.auth.password_in = empty
  node.session.timeo.replacement_timeout = 120
  node.session.err_timeo.abort_timeout = 15
  node.session.err_timeo.lu_reset_timeout = 30
  node.session.err_timeo.host_reset_timeout = 60
  node.session.iscsi.FastAbort = Yes
  node.session.iscsi.InitialR2T = No
  node.session.iscsi.ImmediateData = Yes
  node.session.iscsi.FirstBurstLength = 262144
  node.session.iscsi.MaxBurstLength = 16776192
  node.session.iscsi.DefaultTime2Retain = 0
  node.session.iscsi.DefaultTime2Wait = 2
  node.session.iscsi.MaxConnections = 1
  node.session.iscsi.MaxOutstandingR2T = 1
  node.session.iscsi.ERL = 0
  node.conn[0].address = 10.52.145.121
  node.conn[0].port = 3260
  node.conn[0].startup = manual
  node.conn[0].tcp.window_size = 524288
  node.conn[0].tcp.type_of_service = 0
  node.conn[0].timeo.logout_timeout = 15
  node.conn[0].timeo.login_timeout = 15
  node.conn[0].timeo.auth_timeout = 45
  node.conn[0].timeo.noop_out_interval = 10
  node.conn[0].timeo.noop_out_timeout = 15
  node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
  node.conn[0].iscsi.HeaderDigest = None
  node.conn[0].iscsi.DataDigest = None
  node.conn[0].iscsi.IFMarker = No
  node.conn[0].iscsi.OFMarker = No
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread hissing_sid

All sorted. Definitely running only open-iscsi now.

Still broken though

Loading iSCSI transport class v2.0-870.
iscsi: registered transport (tcp)
iscsi: registered transport (iser)
scsi6 : iSCSI Initiator over TCP/IP
 connection1:0: detected conn error (1011)
 session1: host reset succeeded
 connection1:0: detected conn error (1011)
 session1: host reset succeeded
scsi 6:0:0:0: Device offlined - not ready after error recovery




 On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:

  hissing_sid wrote:
   Hi,

   I am running an initiator on FC9 64-bit (Linux
   2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
   over 1GB ethernet with iSCSI.

   I am running open-iscsi 2.0-870.

  Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one?

   When I try to connect I get a conn error (1011)  and I am struggling
   to know where to go next. The NexSan has no errors. Can someone give
   me pointer on where to go next to try and get this working. I have
   another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
   that works just fine.
   Any help please?

   #iscsiadm -m session
   tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
   028a2347

   Loading iSCSI transport class v2.0-870.
   iscsi: registered transport (tcp)
   iscsi: registered transport (iser)
   scsi3 : iSCSI Initiator over TCP/IP
    connection1:0: detected conn error (1011)
    session1: host reset succeeded

  Looks like maybe the initial inquiry or repport luns that the scsi layer
  sends is timing out. The iscsi layer probably tries to abort the command
  and that fails so we try to drop the session (conn error 1011) then
  re-login. It looks like we log in at the iscsi level ok.

  I am not sure why this would happen. Let me do some digging. I think
  this might have come up before, but I did not see it.

    connection1:0: detected conn error (1011)
    session1: host reset succeeded
   scsi 3:0:0:0: Device offlined - not ready after error recovery

   #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
   10.52.145.121
   node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
   node.tpgt = 2
   node.startup = automatic
   iface.hwaddress = default
   iface.iscsi_ifacename = default
   iface.net_ifacename = default
   iface.transport_name = tcp
   iface.initiatorname = empty
   node.discovery_address = 10.52.145.121
   node.discovery_port = 3260
   node.discovery_type = send_targets
   node.session.initial_cmdsn = 0
   node.session.initial_login_retry_max = 4
   node.session.cmds_max = 128
   node.session.queue_depth = 32
   node.session.auth.authmethod = None
   node.session.auth.username = empty
   node.session.auth.password = empty
   node.session.auth.username_in = empty
   node.session.auth.password_in = empty
   node.session.timeo.replacement_timeout = 120
   node.session.err_timeo.abort_timeout = 15
   node.session.err_timeo.lu_reset_timeout = 30
   node.session.err_timeo.host_reset_timeout = 60
   node.session.iscsi.FastAbort = Yes
   node.session.iscsi.InitialR2T = No
   node.session.iscsi.ImmediateData = Yes
   node.session.iscsi.FirstBurstLength = 262144
   node.session.iscsi.MaxBurstLength = 16776192
   node.session.iscsi.DefaultTime2Retain = 0
   node.session.iscsi.DefaultTime2Wait = 2
   node.session.iscsi.MaxConnections = 1
   node.session.iscsi.MaxOutstandingR2T = 1
   node.session.iscsi.ERL = 0
   node.conn[0].address = 10.52.145.121
   node.conn[0].port = 3260
   node.conn[0].startup = manual
   node.conn[0].tcp.window_size = 524288
   node.conn[0].tcp.type_of_service = 0
   node.conn[0].timeo.logout_timeout = 15
   node.conn[0].timeo.login_timeout = 15
   node.conn[0].timeo.auth_timeout = 45
   node.conn[0].timeo.noop_out_interval = 10
   node.conn[0].timeo.noop_out_timeout = 15
   node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
   node.conn[0].iscsi.HeaderDigest = None
   node.conn[0].iscsi.DataDigest = None
   node.conn[0].iscsi.IFMarker = No
   node.conn[0].iscsi.OFMarker = No
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread Mike Christie

hissing_sid wrote:
 
 I downloaded and compiled the open-iscsi release, so I think I am
 using that.
 
 However, just to be sure I uninstalled iscsi-initiator-utils (yum
 remove ) so I only have open-scsi.
 
 Now, when I try to start iscsid ( ./iscsid -f ) I get the following:
 iscsid: Missing or Invalid version from /sys/module/
 scsi_transport_iscsi/version. Make sure a up to date
 scsi_transport_iscsi module is loaded and a up todate version of
 iscsid is running. Exiting...


You need to load the iscsi modules. modprobe iscsi_tcp

You probably were using the iscsi-initiator-utils tools.  With them you 
do have done:

service iscsi start

with open-iscsi.org tools you do

service open-iscsi start

if you were using the init scripts. The init scripts do the modprbe and 
iscsid, so if you start iscsid by hand you have to do the modprobe too.

You might have a weird mix. Do a whereis iscsid and whereis iscsiadm and 
whereis iscsistart remove them. Then just do yum install 
iscsi-initiator-utils. That should give you the current F9 tools, which 
should work.


 
 I will investigate that. Any hints?
 
 
 On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:
 hissing_sid wrote:
 Hi,
 I am running an initiator on FC9 64-bit (Linux
 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
 over 1GB ethernet with iSCSI.
 I am running open-iscsi 2.0-870.
 Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one?





 When I try to connect I get a conn error (1011)  and I am struggling
 to know where to go next. The NexSan has no errors. Can someone give
 me pointer on where to go next to try and get this working. I have
 another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
 that works just fine.
 Any help please?
 #iscsiadm -m session
 tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
 028a2347
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi3 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 Looks like maybe the initial inquiry or repport luns that the scsi layer
 sends is timing out. The iscsi layer probably tries to abort the command
 and that fails so we try to drop the session (conn error 1011) then
 re-login. It looks like we log in at the iscsi level ok.

 I am not sure why this would happen. Let me do some digging. I think
 this might have come up before, but I did not see it.

  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 3:0:0:0: Device offlined - not ready after error recovery
 #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
 10.52.145.121
 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
 node.tpgt = 2
 node.startup = automatic
 iface.hwaddress = default
 iface.iscsi_ifacename = default
 iface.net_ifacename = default
 iface.transport_name = tcp
 iface.initiatorname = empty
 node.discovery_address = 10.52.145.121
 node.discovery_port = 3260
 node.discovery_type = send_targets
 node.session.initial_cmdsn = 0
 node.session.initial_login_retry_max = 4
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.auth.authmethod = None
 node.session.auth.username = empty
 node.session.auth.password = empty
 node.session.auth.username_in = empty
 node.session.auth.password_in = empty
 node.session.timeo.replacement_timeout = 120
 node.session.err_timeo.abort_timeout = 15
 node.session.err_timeo.lu_reset_timeout = 30
 node.session.err_timeo.host_reset_timeout = 60
 node.session.iscsi.FastAbort = Yes
 node.session.iscsi.InitialR2T = No
 node.session.iscsi.ImmediateData = Yes
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.session.iscsi.DefaultTime2Retain = 0
 node.session.iscsi.DefaultTime2Wait = 2
 node.session.iscsi.MaxConnections = 1
 node.session.iscsi.MaxOutstandingR2T = 1
 node.session.iscsi.ERL = 0
 node.conn[0].address = 10.52.145.121
 node.conn[0].port = 3260
 node.conn[0].startup = manual
 node.conn[0].tcp.window_size = 524288
 node.conn[0].tcp.type_of_service = 0
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.noop_out_interval = 10
 node.conn[0].timeo.noop_out_timeout = 15
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
 node.conn[0].iscsi.HeaderDigest = None
 node.conn[0].iscsi.DataDigest = None
 node.conn[0].iscsi.IFMarker = No
 node.conn[0].iscsi.OFMarker = No
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread Mike Christie

hissing_sid wrote:
 All sorted. Definitely running only open-iscsi now.
 

Ok ignore the request to use iscsi-initiator-utils.

 Still broken though
 
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi6 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 6:0:0:0: Device offlined - not ready after error recovery
 

Can you get a ethereal/wireshark trace?

 
 
 On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:

 hissing_sid wrote:
 Hi,
 I am running an initiator on FC9 64-bit (Linux
 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
 over 1GB ethernet with iSCSI.
 I am running open-iscsi 2.0-870.
 Are you using a open-iscsi.org release or fedora iscsi-initiator-utils one?
 When I try to connect I get a conn error (1011)  and I am struggling
 to know where to go next. The NexSan has no errors. Can someone give
 me pointer on where to go next to try and get this working. I have
 another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
 that works just fine.
 Any help please?
 #iscsiadm -m session
 tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
 028a2347
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi3 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 Looks like maybe the initial inquiry or repport luns that the scsi layer
 sends is timing out. The iscsi layer probably tries to abort the command
 and that fails so we try to drop the session (conn error 1011) then
 re-login. It looks like we log in at the iscsi level ok.
 I am not sure why this would happen. Let me do some digging. I think
 this might have come up before, but I did not see it.
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 3:0:0:0: Device offlined - not ready after error recovery
 #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
 10.52.145.121
 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
 node.tpgt = 2
 node.startup = automatic
 iface.hwaddress = default
 iface.iscsi_ifacename = default
 iface.net_ifacename = default
 iface.transport_name = tcp
 iface.initiatorname = empty
 node.discovery_address = 10.52.145.121
 node.discovery_port = 3260
 node.discovery_type = send_targets
 node.session.initial_cmdsn = 0
 node.session.initial_login_retry_max = 4
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.auth.authmethod = None
 node.session.auth.username = empty
 node.session.auth.password = empty
 node.session.auth.username_in = empty
 node.session.auth.password_in = empty
 node.session.timeo.replacement_timeout = 120
 node.session.err_timeo.abort_timeout = 15
 node.session.err_timeo.lu_reset_timeout = 30
 node.session.err_timeo.host_reset_timeout = 60
 node.session.iscsi.FastAbort = Yes
 node.session.iscsi.InitialR2T = No
 node.session.iscsi.ImmediateData = Yes
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.session.iscsi.DefaultTime2Retain = 0
 node.session.iscsi.DefaultTime2Wait = 2
 node.session.iscsi.MaxConnections = 1
 node.session.iscsi.MaxOutstandingR2T = 1
 node.session.iscsi.ERL = 0
 node.conn[0].address = 10.52.145.121
 node.conn[0].port = 3260
 node.conn[0].startup = manual
 node.conn[0].tcp.window_size = 524288
 node.conn[0].tcp.type_of_service = 0
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.noop_out_interval = 10
 node.conn[0].timeo.noop_out_timeout = 15
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
 node.conn[0].iscsi.HeaderDigest = None
 node.conn[0].iscsi.DataDigest = None
 node.conn[0].iscsi.IFMarker = No
 node.conn[0].iscsi.OFMarker = No
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread hissing_sid

Just looked at the trace myself. I can see it trying to contact LUN 00
but I do not have that configured on my target. I am using LUN 01

That is why is it failing. When I change my Target LUN to 00 it
works.

So, how or where is the initiator deciding on the LUN? Is it
configured or should it find it from the Target?
Thanks for your help so far, I feel I am getting somewhere now!


On May 8, 6:27 pm, hissing_sid dopey...@gmail.com wrote:
 I have a dump. How should I get it to you?

 On May 8, 5:52 pm, Mike Christie micha...@cs.wisc.edu wrote:

  hissing_sid wrote:
   All sorted. Definitely running only open-iscsi now.

  Ok ignore the request to use iscsi-initiator-utils.

   Still broken though

   Loading iSCSI transport class v2.0-870.
   iscsi: registered transport (tcp)
   iscsi: registered transport (iser)
   scsi6 : iSCSI Initiator over TCP/IP
    connection1:0: detected conn error (1011)
    session1: host reset succeeded
    connection1:0: detected conn error (1011)
    session1: host reset succeeded
   scsi 6:0:0:0: Device offlined - not ready after error recovery

  Can you get a ethereal/wireshark trace?

   On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:

   hissing_sid wrote:
   Hi,
   I am running an initiator on FC9 64-bit (Linux
   2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
   over 1GB ethernet with iSCSI.
   I am running open-iscsi 2.0-870.
   Are you using a open-iscsi.org release or fedora iscsi-initiator-utils 
   one?
   When I try to connect I get a conn error (1011)  and I am struggling
   to know where to go next. The NexSan has no errors. Can someone give
   me pointer on where to go next to try and get this working. I have
   another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
   that works just fine.
   Any help please?
   #iscsiadm -m session
   tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
   028a2347
   Loading iSCSI transport class v2.0-870.
   iscsi: registered transport (tcp)
   iscsi: registered transport (iser)
   scsi3 : iSCSI Initiator over TCP/IP
    connection1:0: detected conn error (1011)
    session1: host reset succeeded
   Looks like maybe the initial inquiry or repport luns that the scsi layer
   sends is timing out. The iscsi layer probably tries to abort the command
   and that fails so we try to drop the session (conn error 1011) then
   re-login. It looks like we log in at the iscsi level ok.
   I am not sure why this would happen. Let me do some digging. I think
   this might have come up before, but I did not see it.
    connection1:0: detected conn error (1011)
    session1: host reset succeeded
   scsi 3:0:0:0: Device offlined - not ready after error recovery
   #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
   10.52.145.121
   node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
   node.tpgt = 2
   node.startup = automatic
   iface.hwaddress = default
   iface.iscsi_ifacename = default
   iface.net_ifacename = default
   iface.transport_name = tcp
   iface.initiatorname = empty
   node.discovery_address = 10.52.145.121
   node.discovery_port = 3260
   node.discovery_type = send_targets
   node.session.initial_cmdsn = 0
   node.session.initial_login_retry_max = 4
   node.session.cmds_max = 128
   node.session.queue_depth = 32
   node.session.auth.authmethod = None
   node.session.auth.username = empty
   node.session.auth.password = empty
   node.session.auth.username_in = empty
   node.session.auth.password_in = empty
   node.session.timeo.replacement_timeout = 120
   node.session.err_timeo.abort_timeout = 15
   node.session.err_timeo.lu_reset_timeout = 30
   node.session.err_timeo.host_reset_timeout = 60
   node.session.iscsi.FastAbort = Yes
   node.session.iscsi.InitialR2T = No
   node.session.iscsi.ImmediateData = Yes
   node.session.iscsi.FirstBurstLength = 262144
   node.session.iscsi.MaxBurstLength = 16776192
   node.session.iscsi.DefaultTime2Retain = 0
   node.session.iscsi.DefaultTime2Wait = 2
   node.session.iscsi.MaxConnections = 1
   node.session.iscsi.MaxOutstandingR2T = 1
   node.session.iscsi.ERL = 0
   node.conn[0].address = 10.52.145.121
   node.conn[0].port = 3260
   node.conn[0].startup = manual
   node.conn[0].tcp.window_size = 524288
   node.conn[0].tcp.type_of_service = 0
   node.conn[0].timeo.logout_timeout = 15
   node.conn[0].timeo.login_timeout = 15
   node.conn[0].timeo.auth_timeout = 45
   node.conn[0].timeo.noop_out_interval = 10
   node.conn[0].timeo.noop_out_timeout = 15
   node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
   node.conn[0].iscsi.HeaderDigest = None
   node.conn[0].iscsi.DataDigest = None
   node.conn[0].iscsi.IFMarker = No
   node.conn[0].iscsi.OFMarker = No
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group

Re: Help debugging connection1:0: detected conn error (1011)

2009-05-08 Thread Mike Christie

hissing_sid wrote:
 Just looked at the trace myself. I can see it trying to contact LUN 00
 but I do not have that configured on my target. I am using LUN 01
 
 That is why is it failing. When I change my Target LUN to 00 it
 works.
 
 So, how or where is the initiator deciding on the LUN? Is it

The scsi layer will send a inquiry to lun0 to get some info about the 
target and start off the device discovery process. It would then 
normally send a report luns command to discover all the devices.

Your target might be operating as expected and need some special flags 
in the scsi layer. Attach your trace to here
http://groups-beta.google.com/group/open-iscsi/files
so we can see what is going on.

 configured or should it find it from the Target?
 Thanks for your help so far, I feel I am getting somewhere now!
 
 
 On May 8, 6:27 pm, hissing_sid dopey...@gmail.com wrote:
 I have a dump. How should I get it to you?

 On May 8, 5:52 pm, Mike Christie micha...@cs.wisc.edu wrote:

 hissing_sid wrote:
 All sorted. Definitely running only open-iscsi now.
 Ok ignore the request to use iscsi-initiator-utils.
 Still broken though
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi6 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 6:0:0:0: Device offlined - not ready after error recovery
 Can you get a ethereal/wireshark trace?
 On May 8, 5:08 pm, Mike Christie micha...@cs.wisc.edu wrote:
 hissing_sid wrote:
 Hi,
 I am running an initiator on FC9 64-bit (Linux
 2.6.27.21-78.2.41.fc9.x86_64 #1 SMP) connecting to a NexSan SATA boy
 over 1GB ethernet with iSCSI.
 I am running open-iscsi 2.0-870.
 Are you using a open-iscsi.org release or fedora iscsi-initiator-utils 
 one?
 When I try to connect I get a conn error (1011)  and I am struggling
 to know where to go next. The NexSan has no errors. Can someone give
 me pointer on where to go next to try and get this working. I have
 another machine running FC8 32-bit (Linux 2.6.26.8-57.fc8 #1 SMP) and
 that works just fine.
 Any help please?
 #iscsiadm -m session
 tcp: [1] 10.52.145.121:3260,2 iqn.1999-02.com.nexsan:p1:sataboy:
 028a2347
 Loading iSCSI transport class v2.0-870.
 iscsi: registered transport (tcp)
 iscsi: registered transport (iser)
 scsi3 : iSCSI Initiator over TCP/IP
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 Looks like maybe the initial inquiry or repport luns that the scsi layer
 sends is timing out. The iscsi layer probably tries to abort the command
 and that fails so we try to drop the session (conn error 1011) then
 re-login. It looks like we log in at the iscsi level ok.
 I am not sure why this would happen. Let me do some digging. I think
 this might have come up before, but I did not see it.
  connection1:0: detected conn error (1011)
  session1: host reset succeeded
 scsi 3:0:0:0: Device offlined - not ready after error recovery
 #iscsiadm -m node -T iqn.1999-02.com.nexsan:p1:sataboy:028a2347 -p
 10.52.145.121
 node.name = iqn.1999-02.com.nexsan:p1:sataboy:028a2347
 node.tpgt = 2
 node.startup = automatic
 iface.hwaddress = default
 iface.iscsi_ifacename = default
 iface.net_ifacename = default
 iface.transport_name = tcp
 iface.initiatorname = empty
 node.discovery_address = 10.52.145.121
 node.discovery_port = 3260
 node.discovery_type = send_targets
 node.session.initial_cmdsn = 0
 node.session.initial_login_retry_max = 4
 node.session.cmds_max = 128
 node.session.queue_depth = 32
 node.session.auth.authmethod = None
 node.session.auth.username = empty
 node.session.auth.password = empty
 node.session.auth.username_in = empty
 node.session.auth.password_in = empty
 node.session.timeo.replacement_timeout = 120
 node.session.err_timeo.abort_timeout = 15
 node.session.err_timeo.lu_reset_timeout = 30
 node.session.err_timeo.host_reset_timeout = 60
 node.session.iscsi.FastAbort = Yes
 node.session.iscsi.InitialR2T = No
 node.session.iscsi.ImmediateData = Yes
 node.session.iscsi.FirstBurstLength = 262144
 node.session.iscsi.MaxBurstLength = 16776192
 node.session.iscsi.DefaultTime2Retain = 0
 node.session.iscsi.DefaultTime2Wait = 2
 node.session.iscsi.MaxConnections = 1
 node.session.iscsi.MaxOutstandingR2T = 1
 node.session.iscsi.ERL = 0
 node.conn[0].address = 10.52.145.121
 node.conn[0].port = 3260
 node.conn[0].startup = manual
 node.conn[0].tcp.window_size = 524288
 node.conn[0].tcp.type_of_service = 0
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.noop_out_interval = 10
 node.conn[0].timeo.noop_out_timeout = 15
 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
 node.conn[0].iscsi.HeaderDigest = None
 node.conn[0].iscsi.DataDigest = None
 node.conn[0].iscsi.IFMarker = No
 node.conn[0].iscsi.OFMarker

Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.

2008-05-16 Thread Tomasz Chmielewski

Konrad Rzeszutek schrieb:
 On Thu, Apr 17, 2008 at 06:07:12AM -0400, Konrad Rzeszutek wrote:
 It looks like the network is off but the session is still running. We 
 eventually get to the kernel shutoff here. Is your init script getting 
 run? If not then run it. If you left the session on on purpose then you 
 cannot turn the network off because the scsi layer will want to do its 
 shutdown when the kernel is stopped.
 Ah. Thanks for the explanation. The init script was run, but it didn't 
 logoff of all the sessions (it would selectivly logoff instead of doing
 all of them).
 
 After I made sure that 'iscsiadm -m session -U all' was called during shutdown
 a QA engineer here was able to make the 'iscsiadm' hang during this sequence.
 
 The result was that some of the iSCSI sessions did log-out while some other 
 did not,
 and the machine hanged during the Synchronizing SCSI cache for disk ..

I didn't follow the thread very closely, but a hang during 
Synchronizing SCSI cache for disk happens because:

- iSCSI sessions were not properly disconnected, and
- they can't be properly disconnected any more, because the network is 
already disabled.

Most distributions shut down all network interfaces when a halt 
command is started (i.e., they add -i option to the halt command):

 -i: shut down all network interfaces.

Without this flag, everything should shut down properly, even when it's 
not possible to logout all sessions earlier (i.e., a diskless machine 
started off iSCSI).


-- 
Tomasz Chmielewski
http://wpkg.org

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.

2008-05-16 Thread Konrad Rzeszutek

 Synchronizing SCSI cache for disk happens because:
 
 - iSCSI sessions were not properly disconnected, and

Correct.

 - they can't be properly disconnected any more, because the network is 
 already disabled.

Kind of. There is a kernel timer that gets activated during the logout sequence
that waits for up to 120 seconds (or what you have set in
node.session.timeo.replacement_timeout) and if the logout sequence hasn't 
completed releases the kernel resources.

 
 Most distributions shut down all network interfaces when a halt 
 command is started (i.e., they add -i option to the halt command):
 
  -i: shut down all network interfaces.
 
 Without this flag, everything should shut down properly, even when it's 

Right. And this situation will hang the kernel during reboot b/c the
SCSI error handlers wait for a logout state condition that never happens.

 not possible to logout all sessions earlier (i.e., a diskless machine 
 started off iSCSI).

And the patch I attached in the previous e-mail describes a solution
to this.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: detected conn error (1011) on login with open-iscsi-2.0-869

2008-04-27 Thread DaMn



On 23 Apr, 20:29, Mike Christie [EMAIL PROTECTED] wrote:
 DaMn wrote:
  Hi,

  despite the Subject, i've not found a solution to my problem in others
  threads.
  I'm having trouble connecting a target with two LUN:

  # iscsiadm -m discovery -t st -p 192.168.29.13
  puts out:

  192.168.29.13:3260,1 iqn.1991-05.com.microsoft:atlante-vm-shared-disk-
  vm2-target
  192.168.29.13:3260,1 iqn.1991-05.com.microsoft:atlante-vm-shared-disk-
  vm1-target

  # iscsiadm -m node -T  iqn.1991-05.com.microsoft:atlante-vm-shared-
  disk-vm1-target
  print out:

  node.name = iqn.1991-05.com.microsoft:atlante-vm-shared-disk-vm1-
  target
  node.tpgt = 1
  node.startup = manual
  iface.hwaddress = default
  iface.iscsi_ifacename = default
  iface.net_ifacename = default
  iface.transport_name = tcp
  node.discovery_address = 192.168.29.13
  node.discovery_port = 3260
  node.discovery_type = send_targets
  node.session.initial_cmdsn = 0
  node.session.initial_login_retry_max = 4
  node.session.cmds_max = 128
  node.session.queue_depth = 32
  node.session.auth.authmethod = None
  node.session.auth.username = empty
  node.session.auth.password = empty
  node.session.auth.username_in = empty
  node.session.auth.password_in = empty
  node.session.timeo.replacement_timeout = 120
  node.session.err_timeo.abort_timeout = 15
  node.session.err_timeo.lu_reset_timeout = 20
  node.session.err_timeo.host_reset_timeout = 60
  node.session.iscsi.FastAbort = Yes
  node.session.iscsi.InitialR2T = No
  node.session.iscsi.ImmediateData = Yes
  node.session.iscsi.FirstBurstLength = 262144
  node.session.iscsi.MaxBurstLength = 16776192
  node.session.iscsi.DefaultTime2Retain = 0
  node.session.iscsi.DefaultTime2Wait = 2
  node.session.iscsi.MaxConnections = 1
  node.session.iscsi.MaxOutstandingR2T = 1
  node.session.iscsi.ERL = 0
  node.conn[0].address = 192.168.29.13
  node.conn[0].port = 3260
  node.conn[0].startup = manual
  node.conn[0].tcp.window_size = 524288
  node.conn[0].tcp.type_of_service = 0
  node.conn[0].timeo.logout_timeout = 15
  node.conn[0].timeo.login_timeout = 15
  node.conn[0].timeo.auth_timeout = 45
  node.conn[0].timeo.noop_out_interval = 5
  node.conn[0].timeo.noop_out_timeout = 5
  node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
  node.conn[0].iscsi.HeaderDigest = None,CRC32C
  node.conn[0].iscsi.DataDigest = None
  node.conn[0].iscsi.IFMarker = No
  node.conn[0].iscsi.OFMarker = No

  then, when i try to connect:

  # iscsiadm -m node -T  iqn.1991-05.com.microsoft:atlante-vm-shared-
  disk-vm1-target -l
  prints out:

  Logging in to [iface: default, target: iqn.
  1991-05.com.microsoft:atlante-vm-shared-disk-vm1-target, portal:
  192.168.29.13,3260]
  Login to [iface: default, target: iqn.1991-05.com.microsoft:atlante-vm-
  shared-disk-vm1-target, portal: 192.168.29.13,3260]: successful

  but in /var/log/messages prints out:

  Apr 23 12:32:42 virtualserv1 kernel: scsi69 : iSCSI Initiator over TCP/
  IP
  Apr 23 12:32:42 virtualserv1 kernel:  connection4:0: detected conn
  error (1011)
  Apr 23 12:32:42 virtualserv1 iscsid: connection4:0 is operational now
  Apr 23 12:32:43 virtualserv1 iscsid: Kernel reported iSCSI connection
  4:0 error (1011) state (3)
  Apr 23 12:32:46 virtualserv1 kernel:  session4: host reset succeeded
  Apr 23 12:32:46 virtualserv1 iscsid: connection4:0 is operational
  after recovery (1 attempts)
  Apr 23 12:32:56 virtualserv1 kernel:  69:0:0:0: scsi: Device offlined
  - not ready after error recovery

 It looks the target is not liking something. Are there any target logs?

You are damn right Mike, and the matter is quite embarassing...
Actually, who set up the target forgotten to added an available
virtual disk.
So, open-iscsi logged on the target but no device was detected.
I've required the necessary changes in SAN/NAS configuration and now
all
work fine.

Best regards,
DaMn.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.

2008-04-17 Thread Mike Christie

Konrad Rzeszutek wrote:
 Firstly, I haven't dug in this yet but this is more of a call: 
 have-you-seen-this-too?
 

This is probably on the list 20 times :)

 When I reboot the machine without logging off from iSCSI targets I can
 hang the reboot sequence. This is with 869-rc4 userspace, SLES 10 SP2 Beta 
 kernel, with
 a 869-rc4 kernels compiled out of tree. (With a SLES 10 SP2 Beta kernel, 
 which has
 a back-port of 868-rc1, I get the same bug)
 
 I enabled the debugging in the kernel (DEBUG_SCSI) and added a dump_stack() 
 in the
 iscsi_check_transport_timeouts, and this is what I get:


The timer is still running because the session is.

 iscsi: Sending nopout as ping on conn 88007a0b8a50
 iscsi: Setting next tmo 4294974247
 iscsi: mtask deq [cid 0 itt 0xa06]
 iscsi: mgmtpdu [op 0x0 hdr-itt 0xa06 datalen 0]
 Sending SIGKILL to all processes.
 Please stand by while rebooting the system.
 md: stopping all md devices.
 Synchronizing SCSI cache for disk sdl: 


It looks like the network is off but the session is still running. We 
eventually get to the kernel shutoff here. Is your init script getting 
run? If not then run it. If you left the session on on purpose then you 
cannot turn the network off because the scsi layer will want to do its 
shutdown when the kernel is stopped.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: connection2:0: detected conn error (1011).. when rebooting the machine hangs the reboot sequence.

2008-04-17 Thread Konrad Rzeszutek

 It looks like the network is off but the session is still running. We 
 eventually get to the kernel shutoff here. Is your init script getting 
 run? If not then run it. If you left the session on on purpose then you 
 cannot turn the network off because the scsi layer will want to do its 
 shutdown when the kernel is stopped.

Ah. Thanks for the explanation. The init script was run, but it didn't 
logoff of all the sessions (it would selectivly logoff instead of doing
all of them).

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---