disconnecting part of the state machine
Hi Mike, I'm just reviewing this for the iser disconnect change we're discussing, and I noted that stop conn is called after ep_disconnect. I wasn't sure if this is needed, or maybe bug or just something we can live with... Here's the sequence of events I have collected with the RHEL 5.4 initiator (6.2.0.871-0.10.el5), for what it worth, I was running over 2.6.33. The number on the left is the line number in the attached file. I just did login and after few seconds logout from the command line. 48 kep_connect 53 kep_bind 94 kcreate_session 133 kcreate_conn 139 kbind_conn 153 -- login 203 -- login (opcode 0x23) xxx kset_param 410 kstart_conn 450 -- logout 479 -- logout (opcode 0x26) 486 kep_disconnect 491 kstop_conn 497 kdAestroy_conn 504 kdestroy_session Or. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. iscsid_log.txt.bz2 Description: BZip2 compressed data
Re: multipathing
Mike Christie wrote: The refcouning method sounds good. If iser has cleaned up what gets set in iscsi_iser_conn_bind once its ep_disconnect has completed, then you should be ok. So iser_conn-ib_conn has to be NULLd so later when iscsi_iser_conn_bind is called for the new conn it can be set. okay, I am looking on how to do this And we will have to watch out for a rmmod while there are ib_conns left to completely destroy yes, I never was a big favor of rmmod flows, but, I admit they exist... Or. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: multipathing
Or Gerlitz wrote: Mike Christie wrote: And we will have to watch out for a rmmod while there are ib_conns left yes, I never was a big favor of rmmod flows, but, I admit they exist... I do hit this -EEXIST race from time to time when running things like stop/start sequence. Now I get it quite often but I am playing with the code. I am quite sure that I saw this also on non modified occasions with latest kernels (e.g 2.6.30 and above) did anyone else reported it as well? Or. Loading iSCSI transport class v2.0-870. [ cut here ] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcc/0xe4() Hardware name: X7DW3 sysfs: cannot create duplicate filename '/class/iscsi_endpoint' Modules linked in: scsi_transport_iscsi(+) iw_cxgb3 cxgb3 mdio autofs4 nfs nfs_acl auth_rpcgss lockd sunrpc rds ib_ipoib rdma_ucm rdma_cm ib_ucm ib_uverbs ib_umad ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_multipath video output battery ac joydev sr_mod sg igb ib_mthca ib_mad ib_core button floppy ioatdma rng_core pcspkr dca dm_region_hash dm_log dm_mod usb_storage ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_transport_iscsi] Pid: 5341, comm: modprobe Tainted: GW 2.6.33 #1 Call Trace: [8110a4fc] ? sysfs_add_one+0xcc/0xe4 [810349a2] ? warn_slowpath_common+0x77/0x8e [81034a15] ? warn_slowpath_fmt+0x51/0x59 [8110a428] ? sysfs_pathname+0x35/0x3d [8110a428] ? sysfs_pathname+0x35/0x3d [8110a4fc] ? sysfs_add_one+0xcc/0xe4 [8110a9f5] ? create_dir+0x4f/0x85 [8110aa60] ? sysfs_create_dir+0x35/0x4a [8114a364] ? kobject_add_internal+0xd2/0x18d [8114a443] ? kset_register+0x24/0x3a [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [811e2f1a] ? __class_register+0x128/0x1c0 [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [a0357050] ? iscsi_transport_init+0x50/0x145 [scsi_transport_iscsi] [810001e1] ? do_one_initcall+0x50/0x13d [81060142] ? sys_init_module+0xc8/0x222 [81001eeb] ? system_call_fastpath+0x16/0x1b ---[ end trace c8375848060e5033 ]--- kobject_add_internal failed for iscsi_endpoint with -EEXIST, don't try to register things with the same name in the same directory. Pid: 5341, comm: modprobe Tainted: GW 2.6.33 #1 Call Trace: [8114a3e0] ? kobject_add_internal+0x14e/0x18d [8114a443] ? kset_register+0x24/0x3a [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [811e2f1a] ? __class_register+0x128/0x1c0 [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [a0357050] ? iscsi_transport_init+0x50/0x145 [scsi_transport_iscsi] [810001e1] ? do_one_initcall+0x50/0x13d [81060142] ? sys_init_module+0xc8/0x222 [81001eeb] ? system_call_fastpath+0x16/0x1b -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: multipathing
Mike Christie wrote: So iser_conn-ib_conn has to be NULLd so later when iscsi_iser_conn_bind is called for the new conn it can be set. nullifying iser_conn-ib_conn in iser_conn_terminate is problematic since i_c_terminate is called in the e_disconnect flow where i_c_stop is call later and if iser_conn-ib_conn it doesn't call the iser_conn_put and the ep isn't destroyed... I can change things, ofcourse, just wanted to share with you these findings, let me keep looking on this tomorrow. Or. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: read only access on 1 LUN for multiple initiators
Remember that clients (especially Linux) doe cache blocks locally Doesn't kernel refresh the read cache for blocks at some intervals? If I make a new file on target locally, and waiting for sometime, hitting read command like ls on initiator side will read data from block device, not from stale cache, so that I can read updated data. I believed that this is how the file-system is supposed to work, but it is not? they do not invalidate that cache. Clients not invalidating cache means that they do not update their read cahce unless the volume is remounted? 2010/3/1, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de: On 27 Feb 2010 at 0:23, Kapetanakis Giannis wrote: Hi, This probably have been brought up before, but I couldn't find any info. I'd like to setup multiple initiators (web/ftp cluster) to access read-only the same iscsi target. I would prefer to do this without cluster fs (ie GFS) if possible. However I want to have right access on the target server locally. I managed to do this but I have the following problem: When I write something on the target (locally on the server) the updates are not propagated to the initiators. Naturally, read-only media don't propagate changes, because they don't change. Remember that clients (especially Linux) doe cache blocks locally, and they do not invalidate that cache. So you could even experience application crashes when accessing data structures that were partially cached while being changed on the original. If I unmount and mount again I can see the changes. Naturally, because unmount invalidates the cached blocks for the device. Regards, Ulrich I'm sharing /dev/vg01/iscsi which is an ext3 fs. It's locally mounted on server and also mounted (/dev/sde - /mnt) on clients. Server is centos 5.4 scsi-target-utils-0.0-6.20091205snap.el5_4.1 target iqn.2008-09.com.example:server.target2 backing-store /dev/vg01/iscsi incominguser user pass initiator-address 10.0.0.0/26 write-cache off /target client is Fedora 12 iscsi-initiator-utils-6.2.0.870-10.fc12.1.x86_64 scsi22 : iSCSI Initiator over TCP/IP scsi 22:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 scsi 22:0:0:0: Attached scsi generic sg5 type 12 scsi 22:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 sd 22:0:0:1: Attached scsi generic sg6 type 0 sd 22:0:0:1: [sde] 16777216 512-byte logical blocks: (8.58 GB/8.00 GiB) sd 22:0:0:1: [sde] Write Protect is off sd 22:0:0:1: [sde] Mode Sense: 79 00 00 08 sd 22:0:0:1: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sde: unknown partition table sd 22:0:0:1: [sde] Attached SCSI disk kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. so is GFS the only option? Giannis -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: read only access on 1 LUN for multiple initiators
Remember that clients (especially Linux) doe cache blocks locally Doesn't kernel refresh the read cache for blocks at some intervals? If I make a new file on target locally, and waiting for sometime, hitting read command like ls on initiator side will read data from block device, not from stale cache, so that I can read updated data. I believed that this is how the file-system is supposed to work, but it is not? they do not invalidate that cache. Clients not invalidating cache means that they do not update their read cahce unless the volume is remounted? Thanks. kim 2010/3/1, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de: On 27 Feb 2010 at 0:23, Kapetanakis Giannis wrote: Hi, This probably have been brought up before, but I couldn't find any info. I'd like to setup multiple initiators (web/ftp cluster) to access read-only the same iscsi target. I would prefer to do this without cluster fs (ie GFS) if possible. However I want to have right access on the target server locally. I managed to do this but I have the following problem: When I write something on the target (locally on the server) the updates are not propagated to the initiators. Naturally, read-only media don't propagate changes, because they don't change. Remember that clients (especially Linux) doe cache blocks locally, and they do not invalidate that cache. So you could even experience application crashes when accessing data structures that were partially cached while being changed on the original. If I unmount and mount again I can see the changes. Naturally, because unmount invalidates the cached blocks for the device. Regards, Ulrich I'm sharing /dev/vg01/iscsi which is an ext3 fs. It's locally mounted on server and also mounted (/dev/sde - /mnt) on clients. Server is centos 5.4 scsi-target-utils-0.0-6.20091205snap.el5_4.1 target iqn.2008-09.com.example:server.target2 backing-store /dev/vg01/iscsi incominguser user pass initiator-address 10.0.0.0/26 write-cache off /target client is Fedora 12 iscsi-initiator-utils-6.2.0.870-10.fc12.1.x86_64 scsi22 : iSCSI Initiator over TCP/IP scsi 22:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 scsi 22:0:0:0: Attached scsi generic sg5 type 12 scsi 22:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 sd 22:0:0:1: Attached scsi generic sg6 type 0 sd 22:0:0:1: [sde] 16777216 512-byte logical blocks: (8.58 GB/8.00 GiB) sd 22:0:0:1: [sde] Write Protect is off sd 22:0:0:1: [sde] Mode Sense: 79 00 00 08 sd 22:0:0:1: [sde] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sde: unknown partition table sd 22:0:0:1: [sde] Attached SCSI disk kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. so is GFS the only option? Giannis -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: read only access on 1 LUN for multiple initiators
so is GFS the only option? Is NFS an option? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Starting up
Hello, I'm new to SANs and iSCSI, and I'm lookin to a place to start from, and I'm finding this place is such a great one, so please, can you give me some links to tutorials or videos about it? Regards, Sci3ntist -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: read only access on 1 LUN for multiple initiators
so is GFS the only option? Is NFS an option? -- Romeo Theriault -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: to iface or not to iface?
If you do not use ifaces, then IO will be routed based on the route table. So I think probably, IO would go through the same NIC on the server. Is this what you are seeing? I don't actually have the env. setup yet. Right now I'm just trying to determine what would be the best way to setup the env. without creating another subnet for the iscsi SAN traffic. If you wanted to use dm-multipath to round robin over both NICs on the linux server then you would use a ifaces to bind each session to each NIC. Ok, good to know. Are the two switches connected to each other? If they were and you are using one subnet, you would have better redundancy. Above you have 2 paths to the target, but if the switches are connected you have 4 paths. I'm not sure I'll check this out with the networking crew. The network layer should figure out there is another NIC that can be used and just use it. A problem might be while we are switching nics IO could time out and both paths could be down if they both ended up using the same nic due to the routing table. So you would want to setup dm-multipath with a higher no_path_retry, because when you switch over you might also have to relogin to the target through the new nic. If you used ifaces then the failover should be smoother. The other path would already be logged in, so dm-multipath could just restart the IO right away. Perfect, this is exactly the information I was looking for. Thank you for the help, I really appreciate your response. -- Romeo Theriault System Administrator Information Technology Services -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
[PATCH] Missinig field initialization
Hello, The struct *iscsi_sw_tcp_conn* contains the field *struct iscsi_conn *conn*, which is never initialized and never used. I am currently working on a patch that will change the open-iscsi logging mechanism (as was discussed between Mike, Erez and Ulrich). Some of the logging is done in the context of a connection and in those cases I use the struct *iscsi_conn*. Now, during my work I came across the function * iscsi_sw_tcp_conn_restore_callbacks*, which receives *struct iscsi_sw_tcp_conn*, from which I need to extract the *iscsi_conn* field and use it for logging. However, it is not initialized and hance my fix. One more thing: The function *iscsi_sw_tcp_conn_set_callbacks* receives *struct iscsi_conn*. Wouldn't ii be better if *iscsi_sw_tcp_conn_restore_callbacks*receives the same? If it had, it would have solved my problem, but in any case we should fix that missing field initialization (by initializing or removing it from the struct). Thanks, Avi -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Failover time of iSCSI multipath devices.
Hi all. I am going through some testing of my multipathed iSCSI devices and I am seeing some longer than expected delays. I am running the latest RHEL 5.4 packages as of this morning. I am seeing the failure of the iSCSI sessions take about 67 seconds. After the iSCSI failure the multipath layer picks up almost immediately. Here is a breakdown of /var/log/messages, I am testing dd while pulling a network cable: I started the dd at 07:13:51 Cable pulled at: Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 ISCIS errors at: Mar 1 07:14:28 bentCluster-1 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3) SCSI error and multipath failures at: Mar 1 07:15:35 bentCluster-1 kernel: session2: session recovery timed out after 15 secs Mar 1 07:15:35 bentCluster-1 kernel: sd 3:0:0:1: SCSI error: return code = 0x000f Mar 1 07:15:35 bentCluster-1 kernel: end_request: I/O error, dev sdf, sector 3164079 Mar 1 07:15:35 bentCluster-1 kernel: device-mapper: multipath: Failing path 8:80. And then I/O starts again on the device I am sending I/O down. Finally the other devices fail: Mar 1 07:15:48 bentCluster-1 kernel: device-mapper: multipath: Failing path 8:112. The entire dd took 138 seconds. It looks like the delay is in the iSCSI layer. It took from 07:14:28 to 07:15:35 for the iSCSI session to fail. I am using the timeouts: ● node.session.timeo.replacement_timeout = 15 ● node.conn[0].timeo.noop_out_timeout = 5 ● node.conn[0].timeo.noop_out_interval = 5 http://kbase.redhat.com/faq/docs/DOC-2877 So I guess I have two questions: 1. Based on my timeouts I would think that my session would time out after 15 seconds. Anyone have an idea why is it taking 67 seconds? Am I missing any other timeout values? 2. In a perfect world what is the best case scenario for the failure of my iSCSI session? Thanks in advance. -Ben -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: [PATCH] decrease sndtmo
On 03/01/2010 05:38 AM, Erez Zilber wrote: On Wed, Feb 3, 2010 at 11:51 PM, Mike Christiemicha...@cs.wisc.edu wrote: On 02/03/2010 06:07 AM, Erez Zilber wrote: On Wed, Feb 3, 2010 at 11:30 AM, Mike Christiemicha...@cs.wisc.edu wrote: On 02/03/2010 01:50 AM, Erez Zilber wrote: It looks like I posted it at Red Hat and never got a response, and I probably then forgot about it and never asked upstream. Will send mail upstream now. Which list are you sending it to? I thought it was lkml, but didn't find any discussion there. I think I found a nicer solution. See the attached patch made over linus's tree. I am just not sure if we are allowed to set the sk_err field - maybe it is supposed to be internal to the socket code. The patch seems to be working for me. Works great for me. Ok. I am going to post it to netdev today/tomorrow, to make sure they are ok with how I am accessing the sock struct. Mike, Did you get any response from the netdev list? I just sent it to linux-scsi after looking at some similar code. It ended up getting merged in James's tree. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: 2.6.14-23_compat.patch CentOS 5.4
On 02/02/2010 02:53 AM, Erez Zilber wrote: I've attached 2 versions. One fixes only the 5.5 case and the other one handles all RHEL versions that are6.0. I prefer the 2nd one (assuming that there will be no API breakage until RHEL 6.0). Merged the second one. Thanks! -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi ifaces / multipathing / etc
On 03/01/2010 02:46 AM, Or Gerlitz wrote: Mike Christie wrote: Ah never mind. For some reason I thought you had to have a mask, but if you give rdma_resolve_addr a addr then it will do the right thing and use only the port you wanted right? YES. providing rdma_resolve_addr a source address is like calling rdma_bind with this source address and then calling rdma_resolve_address with only destination address. So its like bind(2) in that respect. Currently the rdma stack through the include/rdma/rdma_cm.h rdma-cm api doesn't support things like SO_BINDTODEVICE to either of network or rdma device. But even if it would/will, I prefer to stay with IP addresses. I am not sure I got why you prefer to use the IP? Was your reason that part you wrote about iser being IP based? I understand how SO_BINDTODEVICE is used for the tcp transport, but its all done in user space, and later when the connection is bounded to the end-point (-- socket created/binded/connected from user space) things are moved to the kernel. This isn't the case with iser. I believe that at this point we agree that there should be a way to specify the source address bounded by the user to the iscsi interface to the kernel iser transport code, correct? Whether it is done in the kernel or userspace or if you support SO_BINDTODEVICE or not is not a issue. We can change userspace so you get a device like tcp and offload and/or we can change the kernel in any sane way so you can bind by whatever. You do not have to support SO_BINDTODEVICE. You can work like how the offload drivers do. I am not sure where you get that I agree ip address is best. All I am saying above is I think I see the API you wanted to use. If the device name and port do not change normally that seems better to me since it works like the other drivers. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: [PATCH] Missinig field initialization
On 03/01/2010 04:33 AM, Avi Kaplan wrote: Hello, The struct *iscsi_sw_tcp_conn* contains the field *struct iscsi_conn *conn*, which is never initialized and never used. I am currently working on a patch that will change the open-iscsi logging mechanism (as was discussed between Mike, Erez and Ulrich). Some of the logging is done in the context of a connection and in those cases I use the struct *iscsi_conn*. Now, during my work I came across the function * iscsi_sw_tcp_conn_restore_callbacks*, which receives *struct iscsi_sw_tcp_conn*, from which I need to extract the *iscsi_conn* field and use it for logging. However, it is not initialized and hance my fix. One more thing: The function *iscsi_sw_tcp_conn_set_callbacks* receives *struct iscsi_conn*. Wouldn't ii be better if *iscsi_sw_tcp_conn_restore_callbacks*receives the same? If it had, it would have solved my problem, but in any case we should fix that missing field initialization (by initializing or removing it from the struct). If it is not used remove it. I think you can actually get the iscsi_conn from the sk_user_data field, but changing iscsi_sw_tcp_conn_restore_callbacks to get a iscsi_conn so it works like set_callbacks is fine. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: disconnecting part of the state machine
On 03/01/2010 08:15 AM, Or Gerlitz wrote: Hi Mike, I'm just reviewing this for the iser disconnect change we're discussing, and I noted that stop conn is called after ep_disconnect. I wasn't sure if this is needed, or maybe bug or just something we can live with... ep_disconnect deals with low level stuff like ib, tcp, hardware/firmware stuff on the card level stuff. stop conn deals with the iscsi connection and cleans up iscsi level stuff like pdus, iscsi cmds, session/connection cleanup. It is definately needed. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: iscsi ifaces / multipathing / etc
On 03/01/2010 06:29 PM, Mike Christie wrote: If the device name and port do not change normally that seems better to me since it works like the other drivers. Oh yeah, just to be clear, I am saying I prefer above, but that is based on what I understand today. As I said I did not understand why you think IP based is best for iser when all other drivers use the other option. Beat your point into my head if I am not getting it :) I am open to change. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: multipathing
On 03/01/2010 09:15 AM, Or Gerlitz wrote: Or Gerlitz wrote: Mike Christie wrote: And we will have to watch out for a rmmod while there are ib_conns left yes, I never was a big favor of rmmod flows, but, I admit they exist... I do hit this -EEXIST race from time to time when running things like stop/start sequence. Now I get it quite often but I am playing with the code. I am quite sure that I saw this also on non modified occasions with latest kernels (e.g 2.6.30 and above) did anyone else reported it as well? Or. Loading iSCSI transport class v2.0-870. [ cut here ] WARNING: at fs/sysfs/dir.c:487 sysfs_add_one+0xcc/0xe4() Hardware name: X7DW3 sysfs: cannot create duplicate filename '/class/iscsi_endpoint' Modules linked in: scsi_transport_iscsi(+) iw_cxgb3 cxgb3 mdio autofs4 nfs nfs_acl auth_rpcgss lockd sunrpc rds ib_ipoib rdma_ucm rdma_cm ib_ucm ib_uverbs ib_umad ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_multipath video output battery ac joydev sr_mod sg igb ib_mthca ib_mad ib_core button floppy ioatdma rng_core pcspkr dca dm_region_hash dm_log dm_mod usb_storage ata_piix libata shpchp megaraid_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_transport_iscsi] Pid: 5341, comm: modprobe Tainted: GW 2.6.33 #1 Call Trace: [8110a4fc] ? sysfs_add_one+0xcc/0xe4 [810349a2] ? warn_slowpath_common+0x77/0x8e [81034a15] ? warn_slowpath_fmt+0x51/0x59 [8110a428] ? sysfs_pathname+0x35/0x3d [8110a428] ? sysfs_pathname+0x35/0x3d [8110a4fc] ? sysfs_add_one+0xcc/0xe4 [8110a9f5] ? create_dir+0x4f/0x85 [8110aa60] ? sysfs_create_dir+0x35/0x4a [8114a364] ? kobject_add_internal+0xd2/0x18d [8114a443] ? kset_register+0x24/0x3a [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [811e2f1a] ? __class_register+0x128/0x1c0 [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [a0357050] ? iscsi_transport_init+0x50/0x145 [scsi_transport_iscsi] [810001e1] ? do_one_initcall+0x50/0x13d [81060142] ? sys_init_module+0xc8/0x222 [81001eeb] ? system_call_fastpath+0x16/0x1b ---[ end trace c8375848060e5033 ]--- kobject_add_internal failed for iscsi_endpoint with -EEXIST, don't try to register things with the same name in the same directory. Pid: 5341, comm: modprobe Tainted: GW 2.6.33 #1 Call Trace: [8114a3e0] ? kobject_add_internal+0x14e/0x18d [8114a443] ? kset_register+0x24/0x3a [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [811e2f1a] ? __class_register+0x128/0x1c0 [a0357000] ? iscsi_transport_init+0x0/0x145 [scsi_transport_iscsi] [a0357050] ? iscsi_transport_init+0x50/0x145 [scsi_transport_iscsi] [810001e1] ? do_one_initcall+0x50/0x13d [81060142] ? sys_init_module+0xc8/0x222 [81001eeb] ? system_call_fastpath+0x16/0x1b I have not seen it. Something probably has iscsi_transport sysfs file open and so that object is not completely freed by the time you start up again. We see this error a lot with dm-multipath testing when paths (/dev/sdxs) are removed and added quickly. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Failover time of iSCSI multipath devices.
On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. If you do not know how to build a RHEL kernel let me know the arch you are using and I can build a kernel here (it takes about a day). after 15 seconds. Anyone have an idea why is it taking 67 seconds? Am I missing any other timeout values? No. The ones you have set are it. 2. In a perfect world what is the best case scenario for the failure of my iSCSI session? It should work like in that doc. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 5c39369..e840806 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -276,11 +276,12 @@ static int iscsi_sw_tcp_xmit(struct iscsi_conn *conn) while (1) { rc = iscsi_sw_tcp_xmit_segment(tcp_conn, segment); - if (rc 0) { + if (rc == -EAGAIN) + return rc; + else if (rc 0) { rc = ISCSI_ERR_XMIT_FAILED; goto error; - } - if (rc == 0) + } else if (rc == 0) break; consumed += rc; @@ -561,9 +562,10 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) struct iscsi_conn *conn = cls_conn-dd_data; struct iscsi_tcp_conn *tcp_conn = conn-dd_data; struct iscsi_sw_tcp_conn *tcp_sw_conn = tcp_conn-dd_data; + struct socket *sock = tcp_sw_conn-sock; /* userspace may have goofed up and not bound us */ - if (!tcp_sw_conn-sock) + if (!sock) return; /* * Make sure our recv side is stopped. @@ -574,6 +576,11 @@ static void iscsi_sw_tcp_conn_stop(struct iscsi_cls_conn *cls_conn, int flag) set_bit(ISCSI_SUSPEND_BIT, conn-suspend_rx); write_unlock_bh(tcp_sw_conn-sock-sk-sk_callback_lock); + if (sock-sk-sk_sleep waitqueue_active(sock-sk-sk_sleep)) { + sock-sk-sk_err = EIO; + wake_up_interruptible(sock-sk-sk_sleep); + } + iscsi2_conn_stop(cls_conn, flag); iscsi_sw_tcp_release_conn(conn); }
Re: Failover time of iSCSI multipath devices.
Mike Christie wrote: On 03/01/2010 12:06 PM, bet wrote: 1. Based on my timeouts I would think that my session would time out Yes. It should timeout about 15 secs after you see Mar 1 07:14:27 bentCluster-1 kernel: connection4:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4884304, last ping 4889304, now 4894304 You might be hitting a bug where the network layer gets stuck trying to send data. I attached a patch that should fix the problem. If you do not know how to build a RHEL kernel let me know the arch you are using and I can build a kernel here (it takes about a day). after 15 seconds. Anyone have an idea why is it taking 67 seconds? Am I missing any other timeout values? No. The ones you have set are it. 2. In a perfect world what is the best case scenario for the failure of my iSCSI session? It should work like in that doc. wouldn't the abort timeout also have an effect here? or will iSCSI fail the coming abort (that the mid-layer sends when it gets an error sending a SCSI command) immediately? --guy -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Kernel oops on login
Ok so I guess working with old versions of open-iscsi is not accepted here :). So I upgraded to the latest and greatest semi-stable release, 871. I no longer see the is not queued messages and my login and logout work fine. However this is the only the case if I don't have my flash device mounted on /dev/sdc. If the flash is mounted I get this kernel oops: -T iqn.1999-02.com.nexsan:p0:sataboy:01731a5a --login Logging in to [iface: default, target: iqn. 1999-02.com.nexsan:p0:sataboy:01731a5a, portal: 172.19.151.169,3260] kobject_add failed for sdc with -EEXIST, don't try to register things with the same name in the same directory. BUG: unable to handle kernel NULL pointer dereference at virtual address 0008 printing eip: *pde = Oops: [#1] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi CPU:0 EIP:0060:[c0189241]Not tainted VLI EFLAGS: 00210292 (2.6.22.10-vs2.2.0.5-cisco-nmx #1) EIP is at create_dir+0x21/0x190 eax: c2280584 ebx: f7191bac ecx: c2280588 edx: esi: c2280584 edi: c2280584 ebp: esp: f7191b7c ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process iscsid (pid: 5038, ti=f719 task=f77c75b0 task.ti=f719) Stack: c0378110 c037566c c2280588 c2280584 c037566c c0103c41 c2280584 c2280584 c2280584 c01893d4 f7191bac c0203a1f f7b35ac0 f7d690c0 c0186e80 f7b35b10 f7b35ac0 c22804c0 c2280584 f7d690c0 Call Trace: [c0103c41] dump_stack+0x11/0x20 [c01893d4] sysfs_create_dir+0x24/0x70 [c0203a1f] kobject_shadow_add+0x7f/0x1a0 [c0186e80] register_disk+0x50/0x1f0 [c01f4b72] blk_register_queue+0x52/0x90 [c026c6c8] sd_probe+0x278/0x3f0 [c0189e47] sysfs_create_link+0x57/0x150 [c0230c57] driver_probe_device+0x87/0x190 [c0331fc1] klist_next+0x51/0xb0 [c022ff24] bus_for_each_drv+0x44/0x70 [c0230e19] device_attach+0x79/0x80 [c0230d60] __device_attach+0x0/0x10 [c022fe95] bus_attach_device+0x45/0x90 [c022eb93] device_add+0x493/0x560 [c0268502] scsi_sysfs_add_sdev+0x32/0x230 [c02665bd] scsi_probe_and_add_lun+0x95d/0x980 [c0266e91] __scsi_scan_target+0x491/0x5f0 [c0166dcb] mntput_no_expire+0x1b/0x70 [c015bac3] link_path_walk+0x63/0xc0 [c02676a6] scsi_scan_target+0xb6/0xe0 [f8a008fa] iscsi_user_scan_session+0x9a/0xb0 [scsi_transport_iscsi] [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi] [f8a00860] iscsi_user_scan_session+0x0/0xb0 [scsi_transport_iscsi] [c022dd42] device_for_each_child+0x22/0x40 [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi] [f8a00843] iscsi_user_scan+0x23/0x30 [scsi_transport_iscsi] [c026818b] store_scan+0xbb/0xf0 [c013b614] __alloc_pages+0x64/0x2f0 [c02680d0] store_scan+0x0/0xf0 [c0232196] class_device_attr_store+0x26/0x40 [c0188151] sysfs_write_file+0xb1/0x110 [c01880a0] sysfs_write_file+0x0/0x110 [c0153820] vfs_write+0xa0/0x140 [c0153da1] sys_write+0x41/0x70 [c010280e] sysenter_past_esp+0x5f/0x85 === Code: 74 26 00 8d bc 27 00 00 00 00 83 ec 28 89 5c 24 18 8b 5c 24 2c 89 74 24 1c 89 7c 24 20 89 6c 24 24 89 d5 89 4c 24 08 89 44 24 0c 8b 42 08 83 c0 68 e8 94 a2 1a 0031 c0 b9 ff ff ff ff 8b 7c 24 EIP: [c0189241] create_dir+0x21/0x190 SS:ESP 0068:f7191b7c Mar 1 21:09:07 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 0008 Mar 1 21:09:07 localhost kClocksource tsc unstable (delta = 2941960870 ns) ernel: printingTime: pit clocksource has been installed. eip: Mar 1 21:09:07 localhost kernel: *pde = Mar 1 21:09:07 localhost kernel: Oops: [#1] Mar 1 21:09:07 localhost kernel: CPU:0 Mar 1 21:09:07 localhost kernel: EIP:0060:[c0189241]Not tainted VLI Mar 1 21:09:07 localhost kernel: EFLAGS: 00210292 (2.6.22.10- vs2.2.0.5-cisco-nmx #1) Mar 1 21:09:07 localhost kernel: EIP is at create_dir+0x21/0x190 Mar 1 21:09:07 localhost kernel: eax: c2280584 ebx: f7191bac ecx: c2280588 edx: Mar 1 21:09:07 localhost kernel: esi: c2280584 edi: c2280584 ebp: esp: f7191b7c Mar 1 21:09:07 localhost kernel: ds: 007b es: 007b fs: gs: 0033 ss: 0068 Mar 1 21:09:07 localhost kernel: Process iscsid (pid: 5038, ti=f719 task=f77c75b0 task.ti=f719) Mar 1 21:09:07 localhost kernel: Stack: c0378110 c037566c c2280588 c2280584 c037566c c0103c41 c2280584 c2280584 Mar 1 21:09:07 localhost kernel:c2280584 c01893d4 f7191bac c0203a1f f7b35ac0 Mar 1 21:09:07 localhost kernel:f7d690c0 c0186e80 f7b35b10 f7b35ac0 c22804c0 c2280584 f7d690c0 Mar 1 21:09:07 localhost kernel: Call Trace: Mar 1 21:09:07 localhost kernel: [c0103c41] dump_stack+0x11/0x20 Mar 1 21:09:07 localhost kernel: [c01893d4] sysfs_create_dir +0x24/0x70 Mar 1 21:09:07 localhost kernel: [c0203a1f] kobject_shadow_add +0x7f/0x1a0 Mar 1 21:09:07 localhost kernel: [c0186e80] register_disk +0x50/0x1f0 Mar 1 21:09:07 localhost kernel: [c01f4b72] blk_register_queue +0x52/0x90