Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out
On 7 Dec 2009 at 11:20, Mike Christie wrote: Ulrich Windl wrote: On 1 Dec 2009 at 14:57, Erez Zilber wrote: Maintain a list of nop-out PDUs that almost timed out. With this information, you can understand and debug the whole system: you can check your target and see what caused it to be so slow on that specific time, you can see if your network was very busy during that time etc. Hi! Having studied TCP overload protection and flow control mechanisms recently, I wondered if a look at the TCP window sizes could be a indicator equivalent to timed-out nops. My idea is: Why implement something, if it's possibly already there for free. The problem with the nop timeout code is that it detects: 1 If the target is not reachable because something wrong is in the network. 2 If the target is dead. 3 If the network layer is not sending/receiving data fast enough (within the nop timeout). #3 is a problem because we do not know if it is not sending/receiving data quickly because of #1 or #2 or just because we are trying to process more data than the network can handle within the nop timeout value. Do you thing we should we be trying to send iscsi pdus with data segments that are smaller than the window size or some other value or something like that? Or is there a way to get the time it is taking for No, I mean if the network is dead (#1), sending nop's doesn't help. If the target is dead (#2), TCP will time out anyway. I'm unsure about #3: You want to check a guaranteed round-trip time (which is the nop timeout). But what can you really do if the nop times out? You can notice that the network doesn't guarante your expectations. Re-establishing a connection won't make the network faster, I'm afraid. But can't you get that (timestamps) piggy- backed on TCP anyway? tcp packets, and could we use that to automatically determine the nop value? Should we just send a network ping and forget doing the iscsi nop/ping? I basically meant this: If the network fills, the TCP sending window will shrink, and if the network is doing well, the window will widen (that's the overload control). If the receiver is not ready to accept data, the window size will be zero (that's flow control). Now if the network is tight, sending a NOP over the same TCP connection may not have the desired effect. So NOPs just answer the question are you still alive over there?, but it does not answer the question how fast are you in all the cases. Maybe on a switched LAN things are all different from what I describe... Regards, Ulrich -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Information about iSCSI pings that almost timed out
Regrading the average delay of a ping request task - we need to have the average delay, but we're interested only in the average delay of pings that were sent lately (i.e. not pings that were sent a year ago). Am I right? I thought about having a cyclic array of delays in the kernel. It can hold the delays of the last X pings (e.g. X = 1000). Whenever the user runs 'iscsiadm -m session -s', this array will be sent to userspace and we can calc the average delay/standard deviation/whatever you want in userland. Comments? Erez Anyone has comments on this? I'd like to start working on it and need some feedback. Thanks, Erez -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Unable to apply kernel/2.6.26_compat.patch from git master branch
Hi, you are back. I think for your patch, you want to include open_iscsi_compat.h in it. I included open_iscsi_compat.h and created a patch. Please check it. I have a quetion about creating a patch agaist files in sub-directory. I used git diff to output the patch, but each hunk of outputted patch includes kernel/ sub-directory. e.g diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c index 0b810b6..6ffb49c 100644 --- a/kernel/libiscsi.c +++ b/kernel/libiscsi.c However, kernel/ sub-directory in the compat patch will prevent you from making and your current compat patch is actually does't have kernel/ sub-directory. e.g diff --git a/libiscsi.c b/libiscsi.c index 149d5eb..467abbf 100644 --- a/libiscsi.c +++ b/libiscsi.c How do I make a patch without the sub-directory? Since I didn't know how to do it, I simply remove the sub-directory by,,, sed -i 's%a\/kernel%a%g' update_2.6.26_compat.patch2 sed -i 's%b\/kernel%b%g' update_2.6.26_compat.patch2 But, this obviously isn't the way to do it...It will be very appriciated if you tell me the right way to do it. Thanks. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. update_2.6.26_compat.patch2 Description: Binary data
Need help with multipath and iscsi in CentOS 5.4
I'm cross-posting here from linux-iscsi-users since I've seen no traffic in the weeks since I posted this. Hi, I needed a little help or advice with my setup. I'm trying to configure multipathed iscsi on a CentOS 5.4 (RHEL 5.4 clone) box. Very short version: One server with two NICs for iSCSI sees storage on EMC. Storage shows up as four discs, but only one works. So far single connections work: If I setup the box to use one NIC, I get one connection and can use it just fine. When I setup multiple connections I have problems... I created two interfaces, and assigned each one to a NIC iscsiadm -m iface -I iface0 --op=new iscsiadm -m iface -I iface0 --op=update -n iface.net_ifacename -v eth2 iscsiadm -m iface -I iface1 --op=new iscsiadm -m iface -I iface1 --op=update -n iface.net_ifacename -v eth3 Each interface saw two paths to their storage, four total, so far so good. I logged all four of them them in with: iscsiadm -m node -T long ugly string here -l I could see I was connected to all four via iscsiadm-m session At this point, I thought I was set, I had four new devices /dev/sdb /dev/sdc /dev/sdd /dev/sde Ignoring multipath at this point for now, here's where the problem started. I have all four devices, but I can only communicate through one of them: /dev/sdc. As a quick test I tried to fdisk all four partitions, to see if I saw the same thing in each place, and only /dev/sdc works. Turning on multipath, I got a multipathed device consisting of sdb sdc sdd and sde, but sdb sdd and sde are failed with a message of checker msg is emc_clariion_checker: Logical Unit is unbound or LUNZ I'm in the dark here. Is this right? Obviously wrong? Thanks --Kyle -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I do not see ping/nop timeout message in the logs (probably that's why changing the noop timeouts did not work). Simply starting the session does not cause these errors. On starting the second session, I start a daemon that does SCSI commands like INQUIRY on all the paths. After that I see these messages, and the daemon gets stuck for a very long time waiting for SCSI commands to finish. At the backend I have EMC CLARiiON. # iscsiadm -m node -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2 Portal: 192.168.10.1:3260,1 Iface Name: iface0 Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2 Portal: 192.168.12.1:3260,3 Iface Name: iface1 # iscsiadm --mode node --targetname iqn. 1992-04.com.emc:cx.ckm00091100683.a2 node.name = iqn.1992-04.com.emc:cx.ckm00091100683.a2 node.tpgt = 1 node.startup = automatic iface.hwaddress = 00:15:17:A8:A9:0A iface.iscsi_ifacename = iface0 iface.net_ifacename = eth4 iface.transport_name = tcp node.discovery_address = 192.168.10.1 node.discovery_port = 3260 node.discovery_type = send_targets node.session.initial_cmdsn = 0 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.auth.authmethod = None node.session.auth.username = empty node.session.auth.password = empty node.session.auth.username_in = empty node.session.auth.password_in = empty node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 20 node.session.err_timeo.host_reset_timeout = 60 node.session.iscsi.FastAbort = Yes node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.DefaultTime2Wait = 2 node.session.iscsi.MaxConnections = 1 node.session.iscsi.MaxOutstandingR2T = 1 node.session.iscsi.ERL = 0 node.conn[0].address = 192.168.10.1 node.conn[0].port = 3260 node.conn[0].startup = manual node.conn[0].tcp.window_size = 524288 node.conn[0].tcp.type_of_service = 0 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072 node.conn[0].iscsi.HeaderDigest = None,CRC32C node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.IFMarker = No node.conn[0].iscsi.OFMarker = No On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote: avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0error(1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. Did you see a ping/nop timeout message in the logs or just what you included above with theconnerror1011? The ping/nop message would be a little before the conerror1011. What target is this with and are you doing any IO tests when this happens or are you just logging into the second session and then you start to get these errors? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)
I got a similar issue while browsing http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f I wanted to enable logging as mentioned in above link. echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_conn echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_session echo 1 /sys/module/libiscsi/parameters/debug_libiscsi_eh echo 1 /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp echo 1 /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp --- But on my machine I only see. # ls /sys/module/libiscsi/ refcnt sections srcversion # ls /sys/module/iscsi_tcp/ parameters refcnt sections srcversion # ls /sys/module/iscsi_tcp/parameters/max_lun /sys/module/iscsi_tcp/parameters/max_lun # iscsiadm -m session -P 1 Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3 iSCSI Connection State: TRANSPORT WAIT iSCSI Session State: FAILED Internal iscsid Session State: REPOEN On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote: avora wrote: With SLES10 SP3 x86_64, as soon as I start the second iscsi session2, I am very frequently getting the connection errors/ I do not see this with SLES10 SP2 x86_64 on the same setup. Dec 7 18:42:05 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) Dec 7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after recovery (1 attempts) Dec 7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection 2:0error(1011) state (3) Dec 7 18:42:08 cdc-r710s1 kernel: connection2:0:detectedconnerror (1011) I have tried changing noop_out_interval and noop_out_timeout to 120/120 and 0/0 but did not help. Did you see a ping/nop timeout message in the logs or just what you included above with theconnerror1011? The ping/nop message would be a little before the conerror1011. What target is this with and are you doing any IO tests when this happens or are you just logging into the second session and then you start to get these errors? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
lio-target crashes when windows initiator logs in
Hi, I have problems with the lio-target software. I tried lio-core-2.6.31 and lio-core-2.6. I compiled it together with lio-utils under ubuntu 9.10 and debian 5.0. Ubuntu and debian was installed in a virtual machine. I used virtual box 3.0.12. I tried it also on bare metal with the same problems. I can get it working when i use a block device like /dev/sdb. It crashes completely when i use a block device like /dev/sdb1 (The Partition exists!!!) It also crashes completely when i use a logical volume or a md-device. The crash happens whenever a Windows Initiator logs in. I tried Windows Vista and Windows Server 2008. When I start the target module I get the following output: Loading target_core_mod/ConfigFS core: [OK] Calling ConfigFS script /etc/target/tcm_start.sh for target_core_mod: [OK] Calling ConfigFS script /etc/target/lio_start.sh for iscsi_target_mod: [OK] In /var/log/messages I get: Dec 8 18:50:51 debian kernel: [ 106.480865] TARGET_CORE[0]: Loading Generic Kernel Storage Engine: v3.1.0 on Linux/x86_64 on 2.6.31.4v3.1 Dec 8 18:50:51 debian kernel: [ 106.481007] TARGET_CORE[0]: Initialized ConfigFS Fabric Infrastructure: v2.0.0 on Linux/x86_64 on 2.6.31.4v3.1 Dec 8 18:50:51 debian kernel: [ 106.481036] SE_PC[0] - Registered Plugin Class: TRANSPORT Dec 8 18:50:51 debian kernel: [ 106.481061] PLUGIN_TRANSPORT[1] - pscsi registered Dec 8 18:50:51 debian kernel: [ 106.481084] PLUGIN_TRANSPORT[2] - stgt registered Dec 8 18:50:51 debian kernel: [ 106.481212] CORE_STGT[0]: Bus Initalization complete Dec 8 18:50:51 debian kernel: [ 106.481232] PLUGIN_TRANSPORT[4] - iblock registered Dec 8 18:50:51 debian kernel: [ 106.481250] PLUGIN_TRANSPORT[5] - rd_dr registered Dec 8 18:50:51 debian kernel: [ 106.481268] PLUGIN_TRANSPORT[6] - rd_mcp registered Dec 8 18:50:51 debian kernel: [ 106.481285] PLUGIN_TRANSPORT[7] - fileio registered Dec 8 18:50:51 debian kernel: [ 106.481307] SE_PC[1] - Registered Plugin Class: OBJ Dec 8 18:50:51 debian kernel: [ 106.481326] PLUGIN_OBJ[1] - dev registered I then initialize the iscsi target with the following commands tcm_node --block iblock_0/my_dev2 /dev/vg1/lv1 lio_node --addlun iqn.2009-11.local.schule.target.i686:sn.123456789 1 0 my_dev_port iblock_0/my_dev2 lio_node --disableauth iqn.2009-11.local.schule.target.i686:sn. 123456789 1 lio_node --addnp iqn.2009-11.local.schule.target.i686:sn.123456789 1 192.168.56.101:3260 lio_node --addlunacl iqn.2009-11.local.schule.target.i686:sn.123456789 1 iqn.1991-05.com.microsoft:andreas-pc 0 0 lio_node --enabletpg iqn.2009-11.local.schule.target.i686:sn.123456789 1 They produce the following output: Output tcm_node: Status: DEACTIVATED Execute/Left/Max Queue Depth: 0/32/32 SectorSize: 512 MaxSectors: 255 iBlock device: dm-0 Major: 253 Minor: 0 CLAIMED: IBLOCK ConfigFS HBA: iblock_0 Successfully added TCM/ConfigFS HBA: iblock_0 ConfigFS Device Alias: my_dev2 Device Params ['/dev/vg1/lv1'] Set T10 WWN Unit Serial for iblock_0/my_dev2 to: 57f6b040-3159-49df- a5bd-2acdb948ef6f Successfully created TCM/ConfigFS storage object: /sys/kernel/config/ target/core/iblock_0/my_dev2 Output lio_node --addlun: Successfully created iSCSI Target Logical Unit Output lio_node --disableauth: Successfully disabled iSCSI Authentication on iSCSI Target Portal Group: iqn.2009-11.local.schule.target.i686:sn.123456789 1 Output lio_node --addnp: Successfully created network portal: 192.168.56.101:3260 created iqn. 2009-11.local.schule.target.i686:sn.123456789 TPGT: 1 Output von lio_node --addlunacl: Successfully added iSCSI Initiator Mapped LUN: 0 ACL iqn. 1991-05.com.microsoft:andreas-pc for iSCSI Target Portal Group: iqn. 2009-11.local.schule.target.i686:sn.123456789 1 Output von lio_node --enabletpg: Successfully enabled iSCSI Target Portal Group: iqn. 2009-11.local.schule.target.i686:sn.123456789 1 In /var/log/messages the initialization leads to the following: Dec 8 18:53:11 debian kernel: [ 246.679996] Target_Core_ConfigFS: Located se_plugin: 88000dd630e0 plugin_name: iblock hba_type: 4 plugin_dep_id: 0 Dec 8 18:53:11 debian kernel: [ 246.680398] CORE_HBA[0] - Linux- iSCSI.org iBlock HBA Driver 3.1 on Generic Target Core Stack v3.1.0 Dec 8 18:53:11 debian kernel: [ 246.680425] CORE_HBA[0] - Attached iBlock HBA: 0 to Generic Target Core TCQ Depth: 512 Dec 8 18:53:11 debian kernel: [ 246.680452] CORE_HBA[0] - Attached HBA to Generic Target Core Dec 8 18:53:11 debian kernel: [ 246.680852] IBLOCK: Allocated ib_dev for my_dev2 Dec 8 18:53:11 debian kernel: [ 246.680879] Target_Core_ConfigFS: Allocated se_subsystem_dev_t: 88000d86b000 se_dev_su_ptr: 88000ec07800 Dec 8 18:53:11 debian kernel: [ 246.720958] Target_Core_ConfigFS: iblock_0/my_dev2 set udev_path: /dev/vg1/lv1 Dec 8 18:53:11 debian kernel: [ 246.735619] IBLOCK: Claiming struct block_device: 88000f2d8200 Dec 8 18:53:11 debian kernel: [ 246.735714] bio: create slab bio-1 at 1 Dec 8 18:53:11