Re: [PATCH] Maintain a list of nop-out PDUs that almost timed out

2009-12-08 Thread Ulrich Windl
On 7 Dec 2009 at 11:20, Mike Christie wrote:

 Ulrich Windl wrote:
  On 1 Dec 2009 at 14:57, Erez Zilber wrote:
  
  Maintain a list of nop-out PDUs that almost timed out.
  With this information, you can understand and debug the
  whole system: you can check your target and see what caused
  it to be so slow on that specific time, you can see if your
  network was very busy during that time etc.
 
  
  Hi!
  
  Having studied TCP overload protection and flow control mechanisms 
  recently, I 
  wondered if a look at the TCP window sizes could be a indicator equivalent 
  to 
  timed-out nops. My idea is: Why implement something, if it's possibly 
  already 
  there for free.
  
 
 The problem with the nop timeout code is that it detects:
 
 1 If the target is not reachable because something wrong is in the network.
 2 If the target is dead.
 3 If the network layer is not sending/receiving data fast enough (within 
 the nop timeout).
 
 #3 is a problem because we do not know if it is not sending/receiving 
 data quickly because of #1 or #2 or just because we are trying to 
 process more data than the network can handle within the nop timeout value.
 
 Do you thing we should we be trying to send iscsi pdus with data 
 segments that are smaller than the window size or some other value or 
 something like that?  Or is there a way to get the time it is taking for 

No, I mean if the network is dead (#1), sending nop's doesn't help.
If the target is dead (#2), TCP will time out anyway.
I'm unsure about #3: You want to check a guaranteed round-trip time (which is 
the 
nop timeout). But what can you really do if the nop times out? You can notice 
that 
the network doesn't guarante your expectations. Re-establishing a connection 
won't 
make the network faster, I'm afraid. But can't you get that (timestamps) piggy-
backed on TCP anyway?

 tcp packets, and could we use that to automatically determine the nop 
 value? Should we just send a network ping and forget doing the iscsi 
 nop/ping?

I basically meant this: If the network fills, the TCP sending window will 
shrink, and if the network is doing well, the window will widen (that's the 
overload control). If the receiver is not ready to accept data, the window size 
will be zero (that's flow control). Now if the network is tight, sending a 
NOP 
over the same TCP connection may not have the desired effect.

So NOPs just answer the question are you still alive over there?, but it does 
not answer the question how fast are you in all the cases.

Maybe on a switched LAN things are all different from what I describe...

Regards,
Ulrich

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: Information about iSCSI pings that almost timed out

2009-12-08 Thread Erez Zilber
 Regrading the average delay of a ping request task - we need to have
 the average delay, but we're interested only in the average delay of
 pings that were sent lately (i.e. not pings that were sent a year
 ago). Am I right?

 I thought about having a cyclic array of delays in the kernel. It can
 hold the delays of the last X pings (e.g. X = 1000). Whenever the user
 runs 'iscsiadm -m session -s', this array will be sent to userspace
 and we can calc the average delay/standard deviation/whatever you want
 in userland.

 Comments?

 Erez


Anyone has comments on this? I'd like to start working on it and need
some feedback.

Thanks,
Erez

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: Unable to apply kernel/2.6.26_compat.patch from git master branch

2009-12-08 Thread Yangkook Kim
Hi, you are back.

 I think for your patch, you want to include open_iscsi_compat.h in it.

I included open_iscsi_compat.h and created a patch. Please check it.

I have a quetion about creating a patch agaist files in sub-directory.
I used git diff to output the patch, but each hunk of outputted
patch includes kernel/ sub-directory.

e.g
diff --git a/kernel/libiscsi.c b/kernel/libiscsi.c
index 0b810b6..6ffb49c 100644
--- a/kernel/libiscsi.c
+++ b/kernel/libiscsi.c

However, kernel/ sub-directory in the compat patch will prevent you
from making and your current compat patch is actually
does't have kernel/ sub-directory.

e.g
diff --git a/libiscsi.c b/libiscsi.c
index 149d5eb..467abbf 100644
--- a/libiscsi.c
+++ b/libiscsi.c

How do I make a patch without the sub-directory?
Since I didn't know how to do it, I simply remove the sub-directory by,,,

sed -i 's%a\/kernel%a%g' update_2.6.26_compat.patch2
sed -i 's%b\/kernel%b%g' update_2.6.26_compat.patch2

But, this obviously isn't the way to do it...It will be very appriciated
if you tell me the right way to do it.

Thanks.

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




update_2.6.26_compat.patch2
Description: Binary data


Need help with multipath and iscsi in CentOS 5.4

2009-12-08 Thread Kyle Schmitt
I'm cross-posting here from linux-iscsi-users since I've seen no
traffic in the weeks since I posted this.

Hi, I needed a little help or advice with my setup.  I'm trying to
configure multipathed iscsi on a CentOS 5.4 (RHEL 5.4 clone) box.

Very short version: One server with two NICs for iSCSI sees storage on
EMC.  Storage shows up as four discs, but only one works.

So far single connections work: If I setup the box to use one NIC, I
get one connection and can use it just fine.

When I setup multiple connections I have problems...
I created two interfaces, and assigned each one to a NIC
iscsiadm -m iface -I iface0 --op=new
iscsiadm -m iface -I iface0 --op=update -n iface.net_ifacename -v eth2
iscsiadm -m iface -I iface1 --op=new
iscsiadm -m iface -I iface1 --op=update -n iface.net_ifacename -v eth3

Each interface saw two paths to their storage, four total, so far so
good.
I logged all four of them them in with:
iscsiadm -m node -T long ugly string here  -l

I could see I was connected to all four via
iscsiadm-m session

At this point, I thought I was set, I had four new devices
/dev/sdb /dev/sdc /dev/sdd /dev/sde

Ignoring multipath at this point for now, here's where the problem
started.  I have all four devices, but I can only communicate through
one of them: /dev/sdc.

As a quick test I tried to fdisk all four partitions, to see if I saw
the same thing in each place, and only /dev/sdc works.

Turning on multipath, I got a multipathed device consisting of sdb sdc
sdd and sde, but sdb sdd and sde are failed with a message of
checker msg is emc_clariion_checker: Logical Unit is unbound or LUNZ


I'm in the dark here.  Is this right?  Obviously wrong?

Thanks
--Kyle

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-08 Thread avora
I do not see ping/nop timeout message in the logs
(probably that's why changing the noop timeouts did not work).
Simply starting the session does not cause these errors.
On starting the second session, I start a daemon
that does SCSI commands like INQUIRY on all the paths.
After that I see these messages, and the daemon gets stuck
for a very long time waiting for SCSI commands to finish.

At the backend I have EMC CLARiiON.

# iscsiadm -m node -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a2
Portal: 192.168.10.1:3260,1
Iface Name: iface0
Target: iqn.1992-04.com.emc:cx.ckm00091100683.b2
Portal: 192.168.12.1:3260,3
Iface Name: iface1

# iscsiadm --mode node --targetname iqn.
1992-04.com.emc:cx.ckm00091100683.a2
node.name = iqn.1992-04.com.emc:cx.ckm00091100683.a2
node.tpgt = 1
node.startup = automatic
iface.hwaddress = 00:15:17:A8:A9:0A
iface.iscsi_ifacename = iface0
iface.net_ifacename = eth4
iface.transport_name = tcp
node.discovery_address = 192.168.10.1
node.discovery_port = 3260
node.discovery_type = send_targets
node.session.initial_cmdsn = 0
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.auth.authmethod = None
node.session.auth.username = empty
node.session.auth.password = empty
node.session.auth.username_in = empty
node.session.auth.password_in = empty
node.session.timeo.replacement_timeout = 120
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.err_timeo.host_reset_timeout = 60
node.session.iscsi.FastAbort = Yes
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.session.iscsi.DefaultTime2Retain = 0
node.session.iscsi.DefaultTime2Wait = 2
node.session.iscsi.MaxConnections = 1
node.session.iscsi.MaxOutstandingR2T = 1
node.session.iscsi.ERL = 0
node.conn[0].address = 192.168.10.1
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None,CRC32C
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote:
 avora wrote:
  With SLES10 SP3 x86_64,
  as soon as I start the second iscsi session2, I am very frequently
  getting the connection errors/
  I do not see this with SLES10 SP2 x86_64 on the same setup.

  Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)
  Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
  2:0error(1011) state (3)
  Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)

  I have tried changing noop_out_interval and noop_out_timeout to
  120/120 and 0/0 but did not help.

 Did you see a ping/nop timeout message in the logs or just what you
 included above with theconnerror1011? The ping/nop message would be a
 little before the conerror1011.

 What target is this with and are you doing any IO tests when this
 happens or are you just logging into the second session and then you
 start to get these errors?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: SLES10 SP3 x86_64 - connection2:0: detected conn error (1011)

2009-12-08 Thread avora
I got a similar issue while browsing
http://groups.google.com/group/open-iscsi/browse_thread/thread/3c9c37903e40cd6f

I wanted to enable logging as mentioned in above link.

echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_conn
echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_session
echo 1  /sys/module/libiscsi/parameters/debug_libiscsi_eh
echo 1  /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp
echo 1  /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
---

But on my machine I only see.

#  ls /sys/module/libiscsi/
refcnt  sections  srcversion

# ls /sys/module/iscsi_tcp/
parameters  refcnt  sections  srcversion

# ls /sys/module/iscsi_tcp/parameters/max_lun
/sys/module/iscsi_tcp/parameters/max_lun


# iscsiadm -m session -P 1
Target: iqn.1992-04.com.emc:cx.ckm00091100683.a3

iSCSI Connection State: TRANSPORT WAIT
iSCSI Session State: FAILED
Internal iscsid Session State: REPOEN





On Dec 7, 10:31 pm, Mike Christie micha...@cs.wisc.edu wrote:
 avora wrote:
  With SLES10 SP3 x86_64,
  as soon as I start the second iscsi session2, I am very frequently
  getting the connection errors/
  I do not see this with SLES10 SP2 x86_64 on the same setup.

  Dec  7 18:42:05 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)
  Dec  7 18:42:06 cdc-r710s1 iscsid: connection2:0 is operational after
  recovery (1 attempts)
  Dec  7 18:42:06 cdc-r710s1 iscsid: Kernel reported iSCSI connection
  2:0error(1011) state (3)
  Dec  7 18:42:08 cdc-r710s1 kernel:  connection2:0:detectedconnerror
  (1011)

  I have tried changing noop_out_interval and noop_out_timeout to
  120/120 and 0/0 but did not help.

 Did you see a ping/nop timeout message in the logs or just what you
 included above with theconnerror1011? The ping/nop message would be a
 little before the conerror1011.

 What target is this with and are you doing any IO tests when this
 happens or are you just logging into the second session and then you
 start to get these errors?

--

You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




lio-target crashes when windows initiator logs in

2009-12-08 Thread ablock

Hi,
I have problems with the lio-target software. I tried lio-core-2.6.31
and lio-core-2.6.
I compiled it together with lio-utils under ubuntu 9.10 and debian
5.0.
Ubuntu and debian was installed in a virtual machine. I used virtual
box 3.0.12.
I tried it also on bare metal with the same problems.


I can get it working when i use a block device like /dev/sdb.
It crashes completely when i use a block device like /dev/sdb1 (The
Partition exists!!!)
It also crashes completely when i use a logical volume or a md-device.

The crash happens whenever a Windows Initiator logs in. I tried
Windows Vista and Windows Server 2008.

When I start the target module I get the following output:

Loading target_core_mod/ConfigFS core:   [OK]
Calling ConfigFS script /etc/target/tcm_start.sh for
target_core_mod:   [OK]
Calling ConfigFS script /etc/target/lio_start.sh for
iscsi_target_mod:   [OK]


In /var/log/messages I get:

Dec  8 18:50:51 debian kernel: [  106.480865] TARGET_CORE[0]: Loading
Generic Kernel Storage Engine: v3.1.0 on Linux/x86_64 on 2.6.31.4v3.1
Dec  8 18:50:51 debian kernel: [  106.481007] TARGET_CORE[0]:
Initialized ConfigFS Fabric Infrastructure: v2.0.0 on Linux/x86_64 on
2.6.31.4v3.1
Dec  8 18:50:51 debian kernel: [  106.481036] SE_PC[0] - Registered
Plugin Class: TRANSPORT
Dec  8 18:50:51 debian kernel: [  106.481061] PLUGIN_TRANSPORT[1] -
pscsi registered
Dec  8 18:50:51 debian kernel: [  106.481084] PLUGIN_TRANSPORT[2] -
stgt registered
Dec  8 18:50:51 debian kernel: [  106.481212] CORE_STGT[0]: Bus
Initalization complete
Dec  8 18:50:51 debian kernel: [  106.481232] PLUGIN_TRANSPORT[4] -
iblock registered
Dec  8 18:50:51 debian kernel: [  106.481250] PLUGIN_TRANSPORT[5] -
rd_dr registered
Dec  8 18:50:51 debian kernel: [  106.481268] PLUGIN_TRANSPORT[6] -
rd_mcp registered
Dec  8 18:50:51 debian kernel: [  106.481285] PLUGIN_TRANSPORT[7] -
fileio registered
Dec  8 18:50:51 debian kernel: [  106.481307] SE_PC[1] - Registered
Plugin Class: OBJ
Dec  8 18:50:51 debian kernel: [  106.481326] PLUGIN_OBJ[1] - dev
registered


I then initialize the iscsi target with the following commands

tcm_node --block iblock_0/my_dev2 /dev/vg1/lv1
lio_node --addlun iqn.2009-11.local.schule.target.i686:sn.123456789 1
0 my_dev_port iblock_0/my_dev2
lio_node --disableauth iqn.2009-11.local.schule.target.i686:sn.
123456789 1
lio_node --addnp iqn.2009-11.local.schule.target.i686:sn.123456789 1
192.168.56.101:3260
lio_node --addlunacl iqn.2009-11.local.schule.target.i686:sn.123456789
1 iqn.1991-05.com.microsoft:andreas-pc 0 0
lio_node --enabletpg iqn.2009-11.local.schule.target.i686:sn.123456789
1

They produce the following output:
Output tcm_node:

Status: DEACTIVATED  Execute/Left/Max Queue Depth: 0/32/32
SectorSize: 512  MaxSectors: 255
iBlock device: dm-0
Major: 253 Minor: 0  CLAIMED: IBLOCK
 ConfigFS HBA: iblock_0
Successfully added TCM/ConfigFS HBA: iblock_0
 ConfigFS Device Alias: my_dev2
Device Params ['/dev/vg1/lv1']
Set T10 WWN Unit Serial for iblock_0/my_dev2 to: 57f6b040-3159-49df-
a5bd-2acdb948ef6f
Successfully created TCM/ConfigFS storage object: /sys/kernel/config/
target/core/iblock_0/my_dev2

Output lio_node --addlun:
Successfully created iSCSI Target Logical Unit

Output lio_node --disableauth:
Successfully disabled iSCSI Authentication on iSCSI Target Portal
Group: iqn.2009-11.local.schule.target.i686:sn.123456789 1

Output lio_node --addnp:
Successfully created network portal: 192.168.56.101:3260 created iqn.
2009-11.local.schule.target.i686:sn.123456789 TPGT: 1

Output von lio_node --addlunacl:
Successfully added iSCSI Initiator Mapped LUN: 0 ACL iqn.
1991-05.com.microsoft:andreas-pc for iSCSI Target Portal Group: iqn.
2009-11.local.schule.target.i686:sn.123456789 1

Output von lio_node --enabletpg:
Successfully enabled iSCSI Target Portal Group: iqn.
2009-11.local.schule.target.i686:sn.123456789 1


In /var/log/messages the initialization leads to the following:

Dec  8 18:53:11 debian kernel: [  246.679996] Target_Core_ConfigFS:
Located se_plugin: 88000dd630e0 plugin_name: iblock hba_type: 4
plugin_dep_id: 0
Dec  8 18:53:11 debian kernel: [  246.680398] CORE_HBA[0] - Linux-
iSCSI.org iBlock HBA Driver 3.1 on Generic Target Core Stack v3.1.0
Dec  8 18:53:11 debian kernel: [  246.680425] CORE_HBA[0] - Attached
iBlock HBA: 0 to Generic Target Core TCQ Depth: 512
Dec  8 18:53:11 debian kernel: [  246.680452] CORE_HBA[0] - Attached
HBA to Generic Target Core
Dec  8 18:53:11 debian kernel: [  246.680852] IBLOCK: Allocated ib_dev
for my_dev2
Dec  8 18:53:11 debian kernel: [  246.680879] Target_Core_ConfigFS:
Allocated se_subsystem_dev_t: 88000d86b000 se_dev_su_ptr:
88000ec07800
Dec  8 18:53:11 debian kernel: [  246.720958] Target_Core_ConfigFS:
iblock_0/my_dev2 set udev_path: /dev/vg1/lv1
Dec  8 18:53:11 debian kernel: [  246.735619] IBLOCK: Claiming struct
block_device: 88000f2d8200
Dec  8 18:53:11 debian kernel: [  246.735714] bio: create slab bio-1
at 1
Dec  8 18:53:11