Re: question regarding offlined device

2009-02-20 Thread Ulrich Windl

On 19 Feb 2009 at 9:48, Jesse Butler wrote:

 
 
 I am trying to troubleshoot why a connection is popping up and down,  
 and finally staying down, with a Linux RHEL 5.2 Open iSCSI / iSER  
 initiator.

Hi!

Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are 
reachable over 16 paths each, and the networ connections use two dedicated 
separate VLANs. As it's a test at the moment, there's only one initiator and 
two 
iSCSCI gateways connected to that switch. And there is no traffic (the 
filesystem 
is a RAID1 that is not mounted). Still I'm seeing message like this:

Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery (
1 attempts)
Feb 19 01:39:51 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:32.
Feb 19 01:39:51 rkdvmso1 multipathd: sdc: tur checker reports path is down
Feb 19 01:39:51 rkdvmso1 multipathd: checker failed path 8:32 in map EVA-2_iSCSI
-1
Feb 19 01:39:51 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11
Feb 19 01:39:56 rkdvmso1 multipathd: sdc: tur checker reports path is up
Feb 19 01:39:56 rkdvmso1 multipathd: 8:32: reinstated
Feb 19 01:39:56 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12
Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after recovery (
1 attempts)
Feb 19 07:49:30 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 65:48
.
Feb 19 07:49:30 rkdvmso1 multipathd: sdt: tur checker reports path is down
Feb 19 07:49:30 rkdvmso1 multipathd: checker failed path 65:48 in map EVA-1_iSCS
I-1
Feb 19 07:49:30 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 11
Feb 19 07:49:35 rkdvmso1 multipathd: sdt: tur checker reports path is up
Feb 19 07:49:35 rkdvmso1 multipathd: 65:48: reinstated
Feb 19 07:49:35 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 12
Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after recovery (
1 attempts)
Feb 19 08:53:16 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:192
.
Feb 19 08:53:16 rkdvmso1 multipathd: sdm: tur checker reports path is down
Feb 19 08:53:16 rkdvmso1 multipathd: checker failed path 8:192 in map EVA-2_iSCS
I-1
Feb 19 08:53:16 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11
Feb 19 08:53:21 rkdvmso1 multipathd: sdm: tur checker reports path is up
Feb 19 08:53:21 rkdvmso1 multipathd: 8:192: reinstated
Feb 19 08:53:21 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12
Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after recovery (
1 attempts)
Feb 19 14:09:41 rkdvmso1 multipathd: sdd: tur checker reports path is down
Feb 19 14:09:41 rkdvmso1 multipathd: checker failed path 8:48 in map EVA-2_iSCSI
-1
Feb 19 14:09:41 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:48.
Feb 19 14:09:46 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11
Feb 19 14:09:46 rkdvmso1 multipathd: sdd: tur checker reports path is up
Feb 19 14:09:46 rkdvmso1 multipathd: 8:48: reinstated
Feb 19 14:09:46 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12
Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after recovery (
1 attempts)
Feb 19 19:33:43 rkdvmso1 multipathd: sdi: tur checker reports path is down
Feb 19 19:33:43 rkdvmso1 multipathd: checker failed path 8:128 in map EVA-1_iSCS
I-1
Feb 19 19:33:43 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:128
.
Feb 19 19:33:49 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 11
Feb 19 19:33:49 rkdvmso1 multipathd: sdi: tur checker reports path is up
Feb 19 19:33:49 rkdvmso1 multipathd: 8:128: reinstated
Feb 19 19:33:49 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 12
Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after recovery (
1 attempts)
Feb 20 01:43:57 rkdvmso1 multipathd: sdc: tur checker reports path is down
Feb 20 01:43:57 rkdvmso1 multipathd: checker failed path 8:32 in map EVA-2_iSCSI
-1
Feb 20 01:43:57 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:32.
Feb 20 01:43:57 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11
Feb 20 01:44:02 rkdvmso1 multipathd: sdc: tur checker reports path is up
Feb 20 01:44:02 rkdvmso1 multipathd: 8:32: reinstated
Feb 20 01:44:02 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12
[...]

I have an odd feeling about that.

Regards,
Ulrich


 
 I see various references to host reset, and finally one looks like  
 the following. It says it succeeded, but this time rather than IO  
 continuing, I see the Device offlined - not ready after error  
 recovery.
 
 Do we have any idea what is happening here based upon this console  
 output?  What is host reset meant to do, and can we tell how it  
 failed?
 
 Thanks
 /jb
 
 
 
 Feb  7 11:20:05 nws-bur-25-48 kernel: iser:  
 iscsi_iser_ib_conn_lookup:no conn exists for
 eph Feb  7 11:20:05 nws-bur-25-48 kernel: iser:  
 iser_connect:connecting to: 192.168.1.5, port
   0xbc0c
 Feb  7 

How to negotiate block size between target initiator

2009-02-20 Thread vaibhav nipunage

Hi All,

I have question about how the iscsi-target  any initator negotiate
the block size. i.e. how initiator will understand that the iscsi-
target is configured for another size, say 4096. How will iscsi-target
tell the initiators that I am configured for 4096 as block size
instead of 512?

Please help me.

Thanks in advance.

Vaibhav Nipunage




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: question regarding offlined device

2009-02-20 Thread Konrad Rzeszutek

 Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are 
 reachable over 16 paths each, and the networ connections use two dedicated 
 separate VLANs. As it's a test at the moment, there's only one initiator and 
 two 
 iSCSCI gateways connected to that switch. And there is no traffic (the 
 filesystem 
 is a RAID1 that is not mounted). Still I'm seeing message like this:
 
 Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after 
 recovery (
 1 attempts)
 Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after 
 recovery (
 1 attempts)
 Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after 
 recovery (
 1 attempts)
 Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after 
 recovery (
 1 attempts)
 Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after 
 recovery (
 1 attempts)
 Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after 
 recovery (
 1 attempts)

It looks as if each path gets bumped. Is there anything related in the target 
around those times? Did you see a pattern over a long time of which path and 
what
time they are being bumped?

Also, are there any backup being done over that time-frame that would saturate
the switches?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: question regarding offlined device

2009-02-20 Thread Ulrich Windl

On 20 Feb 2009 at 10:19, Konrad Rzeszutek wrote:

 
  Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that 
  are 
  reachable over 16 paths each, and the networ connections use two dedicated 
  separate VLANs. As it's a test at the moment, there's only one initiator 
  and two 
  iSCSCI gateways connected to that switch. And there is no traffic (the 
  filesystem 
  is a RAID1 that is not mounted). Still I'm seeing message like this:
  
  Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after 
  recovery (
  1 attempts)
  Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after 
  recovery (
  1 attempts)
  Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after 
  recovery (
  1 attempts)
  Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after 
  recovery (
  1 attempts)
  Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after 
  recovery (
  1 attempts)
  Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after 
  recovery (
  1 attempts)
 
 It looks as if each path gets bumped. Is there anything related in the target 
 around those times? Did you see a pattern over a long time of which path and 
 what
 time they are being bumped?
 
 Also, are there any backup being done over that time-frame that would saturate
 the switches?

Hi!

I'll have to investigate a bit deeper. If it's not the network, it could be FC 
or 
the target hardware itself. There are some backups actually running, some of 
those 
I'm responsible for, and others I'm not responsible for, but they use the same 
infrastructure, and even others that only use the FC ISL (Inter-Switch-Links) 
for 
data transport. However we were believing that FC guarantees maximum delays and 
no 
packet drops, so I dobt that it's the FC, but who knows? I'll have a deeper 
look, 
comparing it to the backup times I know...

Regards,
Ulrich



 
  



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



iscsid: connection198:0 is operational after recovery

2009-02-20 Thread Ulrich Windl

Hello,

I noticed that I see messages like

Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery 
(1 
attempts)

but no messages telling me that the connection is down (timed out/not 
responding). 
So it's hard to guess when the problem actually occurred. Not knowing the 
sources, 
could it be that the message priority for the failure message (if there is any) 
is 
set too low?

Regards,
Ulrich


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: question regarding offlined device

2009-02-20 Thread Konrad Rzeszutek

On Fri, Feb 20, 2009 at 05:07:49PM +0100, Ulrich Windl wrote:
 
 On 20 Feb 2009 at 10:19, Konrad Rzeszutek wrote:
 
  
   Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that 
   are 
   reachable over 16 paths each, and the networ connections use two 
   dedicated 
   separate VLANs. As it's a test at the moment, there's only one initiator 
   and two 
   iSCSCI gateways connected to that switch. And there is no traffic (the 
   filesystem 
   is a RAID1 that is not mounted). Still I'm seeing message like this:
   
   Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after 
   recovery (
   1 attempts)
   Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after 
   recovery (
   1 attempts)
   Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after 
   recovery (
   1 attempts)
   Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after 
   recovery (
   1 attempts)
   Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after 
   recovery (
   1 attempts)
   Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after 
   recovery (
   1 attempts)
  
  It looks as if each path gets bumped. Is there anything related in the 
  target 
  around those times? Did you see a pattern over a long time of which path 
  and what
  time they are being bumped?
  
  Also, are there any backup being done over that time-frame that would 
  saturate
  the switches?
 
 Hi!
 
 I'll have to investigate a bit deeper. If it's not the network, it could be 
 FC or 
 the target hardware itself. There are some backups actually running, some of 
 those 

Those are iSCSI errors. How does FC figure in this? Is FC used for backups? If 
so, then
that shouldn't be an issue as the FC and network fabric are seperate.

 I'm responsible for, and others I'm not responsible for, but they use the 
 same 
 infrastructure, and even others that only use the FC ISL (Inter-Switch-Links) 
 for 
 data transport. However we were believing that FC guarantees maximum delays 
 and no 
 packet drops, so I dobt that it's the FC, but who knows? I'll have a deeper 
 look, 

You will get errors when the FC packets get dropped and can't be re-sent.

 comparing it to the backup times I know...
 
 Regards,
 Ulrich
 
 
 
  
   
 
 
 
 

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: iscsid: connection198:0 is operational after recovery

2009-02-20 Thread Konrad Rzeszutek

On Fri, Feb 20, 2009 at 05:11:24PM +0100, Ulrich Windl wrote:
 
 Hello,
 
 I noticed that I see messages like
 
 Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after 
 recovery (1 
 attempts)
 
 but no messages telling me that the connection is down (timed out/not 
 responding). 
 So it's hard to guess when the problem actually occurred. Not knowing the 
 sources, 
 could it be that the message priority for the failure message (if there is 
 any) is 
 set too low?

Try to set 'loglevel=8' on the bootup argument. Also edit /etc/sysconfig/init 
and
set the LOG to 8 (if you are using a Red Hat variant of OS).

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: question regarding offlined device

2009-02-20 Thread Jesse Butler

thanks very much, mike.  gets me onto the right path, i think...

On Feb 19, 2009, at 12:19 PM, Mike Christie wrote:


 /iJesse Butler wrote:

 I am trying to troubleshoot why a connection is popping up and down,
 and finally staying down, with a Linux RHEL 5.2 Open iSCSI / iSER
 initiator.

 I see various references to host reset, and finally one looks like
 the following. It says it succeeded, but this time rather than IO
 continuing, I see the Device offlined - not ready after error
 recovery.

 Do we have any idea what is happening here based upon this console
 output?  What is host reset meant to do, and can we tell how it
 failed?


 Each scsi command has a timeout. You can see it in
 /sys/block/sdX/device/timeout. If a command does not complete with  
 that
 time, the scsi layer fires it's error handler, which basically asks  
 the
 driver to:

 1. abort the taask.
 2. if 1 fails, reset the lu
 3. if 2 fails, reset the bus (iscsi does not do this).
 4. if 3 fails, reset the host.

 (in newer kernels there is a 2.5 where you can reset the target).

 Software iscsi has a weird implementation where it does a host per
 session, and for the host reset we just logout the session and try to
 log in. We should to a target reset, but we do not currently due to  
 bugs.

 If we get to #4 and that fails then the scsi layer will offline the  
 devices.

 If any of 1-4 is successful in fixing the problem, the scsi layer will
 send some commands to test it out. It will normally send a TUR. If
 eventually get to #4 and the reset succeeds but the TUR fails, then
 devices will be offlined.

 So for some reason

 1. commands are taking a long time and are timing out. I think the
 default in RHEL is 60 seconds.

 2. For some reason the transport seems fine. We can login and out.

 3. For some reason the TUR to test the reset is failing.

 If you do not have a scsi disk you can enable lots of scsi layer
 debugging by doing

 echo -1   /proc/sys/dev/scsi/logging_level

 if you have other scsi or data disks in the system you probably want
 less debugging or it will be a mess.

 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---