Re: question regarding offlined device
On 19 Feb 2009 at 9:48, Jesse Butler wrote: I am trying to troubleshoot why a connection is popping up and down, and finally staying down, with a Linux RHEL 5.2 Open iSCSI / iSER initiator. Hi! Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are reachable over 16 paths each, and the networ connections use two dedicated separate VLANs. As it's a test at the moment, there's only one initiator and two iSCSCI gateways connected to that switch. And there is no traffic (the filesystem is a RAID1 that is not mounted). Still I'm seeing message like this: Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery ( 1 attempts) Feb 19 01:39:51 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:32. Feb 19 01:39:51 rkdvmso1 multipathd: sdc: tur checker reports path is down Feb 19 01:39:51 rkdvmso1 multipathd: checker failed path 8:32 in map EVA-2_iSCSI -1 Feb 19 01:39:51 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11 Feb 19 01:39:56 rkdvmso1 multipathd: sdc: tur checker reports path is up Feb 19 01:39:56 rkdvmso1 multipathd: 8:32: reinstated Feb 19 01:39:56 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12 Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after recovery ( 1 attempts) Feb 19 07:49:30 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 65:48 . Feb 19 07:49:30 rkdvmso1 multipathd: sdt: tur checker reports path is down Feb 19 07:49:30 rkdvmso1 multipathd: checker failed path 65:48 in map EVA-1_iSCS I-1 Feb 19 07:49:30 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 11 Feb 19 07:49:35 rkdvmso1 multipathd: sdt: tur checker reports path is up Feb 19 07:49:35 rkdvmso1 multipathd: 65:48: reinstated Feb 19 07:49:35 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 12 Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after recovery ( 1 attempts) Feb 19 08:53:16 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:192 . Feb 19 08:53:16 rkdvmso1 multipathd: sdm: tur checker reports path is down Feb 19 08:53:16 rkdvmso1 multipathd: checker failed path 8:192 in map EVA-2_iSCS I-1 Feb 19 08:53:16 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11 Feb 19 08:53:21 rkdvmso1 multipathd: sdm: tur checker reports path is up Feb 19 08:53:21 rkdvmso1 multipathd: 8:192: reinstated Feb 19 08:53:21 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12 Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after recovery ( 1 attempts) Feb 19 14:09:41 rkdvmso1 multipathd: sdd: tur checker reports path is down Feb 19 14:09:41 rkdvmso1 multipathd: checker failed path 8:48 in map EVA-2_iSCSI -1 Feb 19 14:09:41 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:48. Feb 19 14:09:46 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11 Feb 19 14:09:46 rkdvmso1 multipathd: sdd: tur checker reports path is up Feb 19 14:09:46 rkdvmso1 multipathd: 8:48: reinstated Feb 19 14:09:46 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12 Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after recovery ( 1 attempts) Feb 19 19:33:43 rkdvmso1 multipathd: sdi: tur checker reports path is down Feb 19 19:33:43 rkdvmso1 multipathd: checker failed path 8:128 in map EVA-1_iSCS I-1 Feb 19 19:33:43 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:128 . Feb 19 19:33:49 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 11 Feb 19 19:33:49 rkdvmso1 multipathd: sdi: tur checker reports path is up Feb 19 19:33:49 rkdvmso1 multipathd: 8:128: reinstated Feb 19 19:33:49 rkdvmso1 multipathd: EVA-1_iSCSI-1: remaining active paths: 12 Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after recovery ( 1 attempts) Feb 20 01:43:57 rkdvmso1 multipathd: sdc: tur checker reports path is down Feb 20 01:43:57 rkdvmso1 multipathd: checker failed path 8:32 in map EVA-2_iSCSI -1 Feb 20 01:43:57 rkdvmso1 kernel: device-mapper: dm-multipath: Failing path 8:32. Feb 20 01:43:57 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 11 Feb 20 01:44:02 rkdvmso1 multipathd: sdc: tur checker reports path is up Feb 20 01:44:02 rkdvmso1 multipathd: 8:32: reinstated Feb 20 01:44:02 rkdvmso1 multipathd: EVA-2_iSCSI-1: remaining active paths: 12 [...] I have an odd feeling about that. Regards, Ulrich I see various references to host reset, and finally one looks like the following. It says it succeeded, but this time rather than IO continuing, I see the Device offlined - not ready after error recovery. Do we have any idea what is happening here based upon this console output? What is host reset meant to do, and can we tell how it failed? Thanks /jb Feb 7 11:20:05 nws-bur-25-48 kernel: iser: iscsi_iser_ib_conn_lookup:no conn exists for eph Feb 7 11:20:05 nws-bur-25-48 kernel: iser: iser_connect:connecting to: 192.168.1.5, port 0xbc0c Feb 7
How to negotiate block size between target initiator
Hi All, I have question about how the iscsi-target any initator negotiate the block size. i.e. how initiator will understand that the iscsi- target is configured for another size, say 4096. How will iscsi-target tell the initiators that I am configured for 4096 as block size instead of 512? Please help me. Thanks in advance. Vaibhav Nipunage --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: question regarding offlined device
Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are reachable over 16 paths each, and the networ connections use two dedicated separate VLANs. As it's a test at the moment, there's only one initiator and two iSCSCI gateways connected to that switch. And there is no traffic (the filesystem is a RAID1 that is not mounted). Still I'm seeing message like this: Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery ( 1 attempts) Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after recovery ( 1 attempts) Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after recovery ( 1 attempts) Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after recovery ( 1 attempts) Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after recovery ( 1 attempts) Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after recovery ( 1 attempts) It looks as if each path gets bumped. Is there anything related in the target around those times? Did you see a pattern over a long time of which path and what time they are being bumped? Also, are there any backup being done over that time-frame that would saturate the switches? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: question regarding offlined device
On 20 Feb 2009 at 10:19, Konrad Rzeszutek wrote: Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are reachable over 16 paths each, and the networ connections use two dedicated separate VLANs. As it's a test at the moment, there's only one initiator and two iSCSCI gateways connected to that switch. And there is no traffic (the filesystem is a RAID1 that is not mounted). Still I'm seeing message like this: Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery ( 1 attempts) Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after recovery ( 1 attempts) Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after recovery ( 1 attempts) Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after recovery ( 1 attempts) Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after recovery ( 1 attempts) Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after recovery ( 1 attempts) It looks as if each path gets bumped. Is there anything related in the target around those times? Did you see a pattern over a long time of which path and what time they are being bumped? Also, are there any backup being done over that time-frame that would saturate the switches? Hi! I'll have to investigate a bit deeper. If it's not the network, it could be FC or the target hardware itself. There are some backups actually running, some of those I'm responsible for, and others I'm not responsible for, but they use the same infrastructure, and even others that only use the FC ISL (Inter-Switch-Links) for data transport. However we were believing that FC guarantees maximum delays and no packet drops, so I dobt that it's the FC, but who knows? I'll have a deeper look, comparing it to the backup times I know... Regards, Ulrich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
iscsid: connection198:0 is operational after recovery
Hello, I noticed that I see messages like Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery (1 attempts) but no messages telling me that the connection is down (timed out/not responding). So it's hard to guess when the problem actually occurred. Not knowing the sources, could it be that the message priority for the failure message (if there is any) is set too low? Regards, Ulrich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: question regarding offlined device
On Fri, Feb 20, 2009 at 05:07:49PM +0100, Ulrich Windl wrote: On 20 Feb 2009 at 10:19, Konrad Rzeszutek wrote: Maybe that's related: With SLES10 SP1 (x86_64) I have two iSCSI LUNs that are reachable over 16 paths each, and the networ connections use two dedicated separate VLANs. As it's a test at the moment, there's only one initiator and two iSCSCI gateways connected to that switch. And there is no traffic (the filesystem is a RAID1 that is not mounted). Still I'm seeing message like this: Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery ( 1 attempts) Feb 19 07:49:30 rkdvmso1 iscsid: connection215:0 is operational after recovery ( 1 attempts) Feb 19 08:53:15 rkdvmso1 iscsid: connection208:0 is operational after recovery ( 1 attempts) Feb 19 14:09:40 rkdvmso1 iscsid: connection211:0 is operational after recovery ( 1 attempts) Feb 19 19:33:42 rkdvmso1 iscsid: connection220:0 is operational after recovery ( 1 attempts) Feb 20 01:43:56 rkdvmso1 iscsid: connection210:0 is operational after recovery ( 1 attempts) It looks as if each path gets bumped. Is there anything related in the target around those times? Did you see a pattern over a long time of which path and what time they are being bumped? Also, are there any backup being done over that time-frame that would saturate the switches? Hi! I'll have to investigate a bit deeper. If it's not the network, it could be FC or the target hardware itself. There are some backups actually running, some of those Those are iSCSI errors. How does FC figure in this? Is FC used for backups? If so, then that shouldn't be an issue as the FC and network fabric are seperate. I'm responsible for, and others I'm not responsible for, but they use the same infrastructure, and even others that only use the FC ISL (Inter-Switch-Links) for data transport. However we were believing that FC guarantees maximum delays and no packet drops, so I dobt that it's the FC, but who knows? I'll have a deeper look, You will get errors when the FC packets get dropped and can't be re-sent. comparing it to the backup times I know... Regards, Ulrich --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: iscsid: connection198:0 is operational after recovery
On Fri, Feb 20, 2009 at 05:11:24PM +0100, Ulrich Windl wrote: Hello, I noticed that I see messages like Feb 19 01:39:50 rkdvmso1 iscsid: connection198:0 is operational after recovery (1 attempts) but no messages telling me that the connection is down (timed out/not responding). So it's hard to guess when the problem actually occurred. Not knowing the sources, could it be that the message priority for the failure message (if there is any) is set too low? Try to set 'loglevel=8' on the bootup argument. Also edit /etc/sysconfig/init and set the LOG to 8 (if you are using a Red Hat variant of OS). --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: question regarding offlined device
thanks very much, mike. gets me onto the right path, i think... On Feb 19, 2009, at 12:19 PM, Mike Christie wrote: /iJesse Butler wrote: I am trying to troubleshoot why a connection is popping up and down, and finally staying down, with a Linux RHEL 5.2 Open iSCSI / iSER initiator. I see various references to host reset, and finally one looks like the following. It says it succeeded, but this time rather than IO continuing, I see the Device offlined - not ready after error recovery. Do we have any idea what is happening here based upon this console output? What is host reset meant to do, and can we tell how it failed? Each scsi command has a timeout. You can see it in /sys/block/sdX/device/timeout. If a command does not complete with that time, the scsi layer fires it's error handler, which basically asks the driver to: 1. abort the taask. 2. if 1 fails, reset the lu 3. if 2 fails, reset the bus (iscsi does not do this). 4. if 3 fails, reset the host. (in newer kernels there is a 2.5 where you can reset the target). Software iscsi has a weird implementation where it does a host per session, and for the host reset we just logout the session and try to log in. We should to a target reset, but we do not currently due to bugs. If we get to #4 and that fails then the scsi layer will offline the devices. If any of 1-4 is successful in fixing the problem, the scsi layer will send some commands to test it out. It will normally send a TUR. If eventually get to #4 and the reset succeeds but the TUR fails, then devices will be offlined. So for some reason 1. commands are taking a long time and are timing out. I think the default in RHEL is 60 seconds. 2. For some reason the transport seems fine. We can login and out. 3. For some reason the TUR to test the reset is failing. If you do not have a scsi disk you can enable lots of scsi layer debugging by doing echo -1 /proc/sys/dev/scsi/logging_level if you have other scsi or data disks in the system you probably want less debugging or it will be a mess. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---