Re: Problem using multiple NICs
On Thu, Nov 12, 2009 at 04:49:47PM -0800, Jim Cole wrote: Hi - I am running into problems utilizing two NICs in an iSCSI setup for multipath IO. The setup involves a Linux server (Ubuntu 9.10 Server) with two Broadcom NetXtreme II GbE NICs connected to two separate switches on a single subnet, which is dedicated to EqualLogic SAN access. Here's what I did when I tested multiple interfaces with Equallogic: http://pasik.reaktio.net/open-iscsi-multiple-ifaces-test.txt -- Pasi I have setup two iface definitions using the following steps. - iscsiadm -m iface -I eth4 --op=new - iscsiadm -m iface -I eth5 --op=new - iscsiadm -m iface -I eth4 --op=update -n iface.net_ifacename -v eth4 - iscsiadm -m iface -I eth5 --op=update -n iface.net_ifacename -v eth5 I have also tried specifying the MAC addresses explicitly with no change in behavior. Discovery was performed with the following command and worked as expected, generating node entries for both interfaces. - iscsiadm -m discovery -t st -p xx.xx.xx.xx:3260 -I eth4 -I eth5 Up to this point everything looks good. And I have no trouble logging one interface into the desired target. However attempts to login the second interface always result in a time out. The message is iscsiadm: Could not login to [iface: eth4, target: target, portal: xx.xx.xx.xx,3260]: iscsiadm: initiator reported error (8 - connection timed out) The problem is not specific to one interface. I am able to login with either one. I just can't seem to login with both at the same time. I am using the open-iscsi package that ships with the Ubuntu distro (open-iscsi 2.0.870.1-0ubuntu12). I have another server on the same network, with identical hardware and iSCSI configuration, that is working properly. The only difference is that the other server is running CentOS 5.4 and using the initiator that ships with that distro (iscsi-initiator-utils 6.2.0.871-0.10.el5). If anyone could provide any guidance on how to further diagnose, and hopefully solve, this problem, it would be greatly appreciated. TIA Jim --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~--- -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. -- Pasi On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: Hoot, Joseph wrote: Hi all, I'm trying to understand what I'm seeing in my /var/log/messages. Here's what I have: Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) the first line, what is connection5:0? is that referenced from iscsiadm somewhere? I only ask because I'm seeing iscsid messages and kernel messages. I also have dm-multipath running, which usually shows up as dm-multipath or something like that. I understand that iscsid is the process that is logging in and out. But is the kernel: message just an iscsi modules that is loaded into the kernel, which is why it is being logged as kernel:? It is the session id and connection id. connection$SESSION_ID:$CONNECTION_ID If you run iscsiadm -m session -P 1 or -P 3 You will see #iscsiadm -m session -P 1 Target: iqn.1992-08.com.netapp:sn.33615311 Current Portal: 10.15.85.19:3260,3 Persistent Portal: 10.15.85.19:3260,3 Iface Transport: tcp Iface IPaddress: 10.11.14.37 Iface HWaddress: default Iface Netdev: default SID: 7 iSCSI Connection State: LOGGED IN Internal iscsid Session State: NO CHANGE Session number is the SID value. If you run iscsiadm -m session tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 the session number/SID is the value in brackets. If you run iscsiadm in session mode (iscsiadm -m session) then you can use the -R argument and pass in a SID to do an opertaion like iscsiadm -m session -R 2 --rescan would rescan that session. Connection number is currently always zero. For the second question, iscsid handles login and logout, and error handling, and the kernel basically passes iscsi packets around. Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 so here the iscsi kernel code sends a iscsi ping/nop every noop_interval seconds, and if we do not get a response withing noop_timeout seconds it will fire off a connection error. Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Here is the kernel code notifying userspace of the problem. Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) And there iscsid is accepting the error (probably no need for the error to be logged twice). Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target And then here iscsid handled the error by killing the tcp/ip connection, reconnection the tcp/ip connection, and then re-logging into the iscsi target. But for some reason we could not log back in right away. iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) But it looks like we tried again and we got back in. Maybe one of these modules? iscsi_tcp 19785 46 libiscsi_tcp 21829 1 iscsi_tcp
Re: iscsi diagnosis help
2) Are there plans in the future to have more than one connection per session? I dont't think so. If you want to know the reason, read the thread titled MC/S support in open-iscsi mailing list. Kim 2009/11/17, Pasi Kärkkäinen pa...@iki.fi: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. -- Pasi On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: Hoot, Joseph wrote: Hi all, I'm trying to understand what I'm seeing in my /var/log/messages. Here's what I have: Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) the first line, what is connection5:0? is that referenced from iscsiadm somewhere? I only ask because I'm seeing iscsid messages and kernel messages. I also have dm-multipath running, which usually shows up as dm-multipath or something like that. I understand that iscsid is the process that is logging in and out. But is the kernel: message just an iscsi modules that is loaded into the kernel, which is why it is being logged as kernel:? It is the session id and connection id. connection$SESSION_ID:$CONNECTION_ID If you run iscsiadm -m session -P 1 or -P 3 You will see #iscsiadm -m session -P 1 Target: iqn.1992-08.com.netapp:sn.33615311 Current Portal: 10.15.85.19:3260,3 Persistent Portal: 10.15.85.19:3260,3 Iface Transport: tcp Iface IPaddress: 10.11.14.37 Iface HWaddress: default Iface Netdev: default SID: 7 iSCSI Connection State: LOGGED IN Internal iscsid Session State: NO CHANGE Session number is the SID value. If you run iscsiadm -m session tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 the session number/SID is the value in brackets. If you run iscsiadm in session mode (iscsiadm -m session) then you can use the -R argument and pass in a SID to do an opertaion like iscsiadm -m session -R 2 --rescan would rescan that session. Connection number is currently always zero. For the second question, iscsid handles login and logout, and error handling, and the kernel basically passes iscsi packets around. Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 so here the iscsi kernel code sends a iscsi ping/nop every noop_interval seconds, and if we do not get a response withing noop_timeout seconds it will fire off a connection error. Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Here is the kernel code notifying userspace of the problem. Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) And there iscsid is accepting the error (probably no need for the error to be logged twice). Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target And then here iscsid handled the error by killing the tcp/ip connection, reconnection the tcp/ip connection, and then re-logging into the iscsi target. But for some reason we could not log back in right away. iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts)
Poor read performance with IET
I am currently running IET on a CentOS 5.4 server with the following kernel: Linux titan1 2.6.18-128.7.1.el5 #1 SMP Mon Aug 24 08:21:56 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux The server is a Dual quad core 2.8 GHz system with 16 GB ram. I am also using Coraid disk shelves via AoE for my block storage that I am offering up as a iscsi target. I am running v 0.4.17 of IET. I am getting very good write performance but lousy read performance. performing a simple sequential write to the iscsi target I get 94 megabytes per sec. With reads I am only getting 12.4 megabytes per sec. My ietd.conf looks like this: Target iqn.2009-11.net.storage:titan.diskshelf1.e1.2 Lun 1 Path=/dev/etherd/e1.2,Type=blockio Alias e1.2 MaxConnections 1 InitialR2T No ImmediateData Yes MaxRecvDataSegmentLength 262144 MaxXmitDataSegmentLength 262144 I have also made the following tweaks to tcp/ip: sysctl net.ipv4.tcp_rmem=100 100 100 sysctl net.ipv4.tcp_wmem=100 100 100 sysctl net.ipv4.tcp_tw_recycle=1 sysctl net.ipv4.tcp_tw_reuse=1 sysctl net.core.rmem_max=524287 sysctl net.core.wmem_max=524287 sysctl net.core.wmem_default=524287 sysctl net.core.optmem_max=524287 sysctl net.core.netdev_max_backlog=30 I am using Broadcom cards in the iscsi target server. I have enabled jumbo rames on them (MTU 9000). They are connected directly into a windows server and I am accessing the iscsi target with MS iscsi initiator. The NIC cards on the Windows server are also set to a MTU of 9000. There is no switch in between, they are directly connected into the Windows server. I also notice that load averages on the Linux box will get into the 7's and 8's when I try pushing the system by performing multiple transfers. Any feedback on what I might be missing here would be great! Thanks Phil -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? It could be, but then when we retry we end up handling the redirect ok. I do not know why the first redirect would fail. Take a wireshark trace or run iscsid by hand iscsid -d 8 send the log output. That will let us know where we failed, but it would not tell us why. Normally EQL targets would leave something in their logs about why. and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? No plans for MC/s. You can do multiple sessions to the same volume though. You can have multiple sessions over the same nic or over difference nics or some combo. On Nov 16, 2009, at 7:18 PM, Mike Christie wrote: Hoot, Joseph wrote: Hi all, I'm trying to understand what I'm seeing in my /var/log/messages. Here's what I have: Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) the first line, what is connection5:0? is that referenced from iscsiadm somewhere? I only ask because I'm seeing iscsid messages and kernel messages. I also have dm-multipath running, which usually shows up as dm-multipath or something like that. I understand that iscsid is the process that is logging in and out. But is the kernel: message just an iscsi modules that is loaded into the kernel, which is why it is being logged as kernel:? It is the session id and connection id. connection$SESSION_ID:$CONNECTION_ID If you run iscsiadm -m session -P 1 or -P 3 You will see #iscsiadm -m session -P 1 Target: iqn.1992-08.com.netapp:sn.33615311 Current Portal: 10.15.85.19:3260,3 Persistent Portal: 10.15.85.19:3260,3 Iface Transport: tcp Iface IPaddress: 10.11.14.37 Iface HWaddress: default Iface Netdev: default SID: 7 iSCSI Connection State: LOGGED IN Internal iscsid Session State: NO CHANGE Session number is the SID value. If you run iscsiadm -m session tcp [2] 10.15.84.19:3260,2 iqn.1992-08.com.netapp:sn.33615311 the session number/SID is the value in brackets. If you run iscsiadm in session mode (iscsiadm -m session) then you can use the -R argument and pass in a SID to do an opertaion like iscsiadm -m session -R 2 --rescan would rescan that session. Connection number is currently always zero. For the second question, iscsid handles login and logout, and error handling, and the kernel basically passes iscsi packets around. Nov 13 10:49:47 oim6102506 kernel: connection5:0: ping timeout of 10 secs expired, last rx 191838122, last ping 191839372, now 191841872 so here the iscsi kernel code sends a iscsi ping/nop every noop_interval seconds, and if we do not get a response withing noop_timeout seconds it will fire off a connection error. Nov 13 10:49:47 oim6102506 kernel: connection5:0: detected conn error (1011) Here is the kernel code notifying userspace of the problem. Nov 13 10:49:47 oim6102506 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3) And there iscsid is accepting the error (probably no need for the error to be logged twice). Nov 13 10:49:50 oim6102506 iscsid: Login authentication failed with target And then here iscsid handled the error by killing the tcp/ip connection, reconnection the tcp/ip connection, and then re-logging into the iscsi target. But for some reason we could not log back in right away. iqn.2001-05.com.equallogic:0-8a0906-e7d1dea02-786272c42554aef2-ovm-2-lun03 Nov 13 10:49:52 oim6102506 iscsid: connection5:0 is operational after recovery (1 attempts) But it looks like we tried again and we got back in. Maybe one of these modules? iscsi_tcp 19785 46 libiscsi_tcp 21829 1 iscsi_tcp libiscsi2 41285 3 ib_iser,iscsi_tcp,libiscsi_tcp scsi_transport_iscsi237197 5 ib_iser,iscsi_tcp,libiscsi2 scsi_transport_iscsi 6085 1 scsi_transport_iscsi2 I'm just trying to make sure that all of my timeout values line
Re: iscsi diagnosis help
Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? It might be a bug. What version of open-iscsi are you using? What kernel? Is it a distro or kernel.org one? And are you using the open-iscsi kernel modules that come with a open-iscsi.org tarball or the kernel modules that come with your kernel? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: Problem using multiple NICs
Jim Cole wrote: Hi - I am running into problems utilizing two NICs in an iSCSI setup for multipath IO. The setup involves a Linux server (Ubuntu 9.10 Server) with two Broadcom NetXtreme II GbE NICs connected to two separate switches on a single subnet, which is dedicated to EqualLogic SAN access. I have setup two iface definitions using the following steps. - iscsiadm -m iface -I eth4 --op=new - iscsiadm -m iface -I eth5 --op=new - iscsiadm -m iface -I eth4 --op=update -n iface.net_ifacename -v eth4 - iscsiadm -m iface -I eth5 --op=update -n iface.net_ifacename -v eth5 I have also tried specifying the MAC addresses explicitly with no change in behavior. Discovery was performed with the following command and worked as expected, generating node entries for both interfaces. - iscsiadm -m discovery -t st -p xx.xx.xx.xx:3260 -I eth4 -I eth5 Up to this point everything looks good. And I have no trouble logging one interface into the desired target. However attempts to login the second interface always result in a time out. The message is iscsiadm: Could not login to [iface: eth4, target: target, portal: xx.xx.xx.xx,3260]: iscsiadm: initiator reported error (8 - connection timed out) The problem is not specific to one interface. I am able to login with either one. I just can't seem to login with both at the same time. Can you do ping through each interface at the same time? Do ping -I eth4 xx.xx.xx.xx in one console and ping -I eth5 xx.xx.xx.xx in another. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
more INLINE below... On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. There are two things that the EQL does, I believe-- one thing is async logout, the other is login_redirect. Unfortunately, from the EQL syslog side we don't see any errors related to this. It's my understanding, however, that when a login is initially attempted to the EQL, it hits the group ip or an alias'd IP sitting on a real nic. The group IP looks at all the interfaces on the EQL and decides, based on some algorithm, which EQL nic the session should connect to. It then sends the initiator that made the request a login_redirect, which I thought is basically a logout and reconnect pdu. It would say, for example, you're can't log into the group IP, however, you can log into this IP (a real nic) that it would prefer you be logged into. I'm thinking that the failed login is actually the result of that attempt to log into the group IP and it sending a login redirect pdu back to it. Don, does this seem like normal EQL traffic to an OiS initiator? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=. === Joseph R. Hoot Lead System Programmer/Analyst (w) 716-878-4832 (c) 716-759-HOOT joe.h...@itec.suny.edu GPG KEY: 7145F633 === -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Hoot, Joseph wrote: more INLINE below... On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. There are two things that the EQL does, I believe-- one thing is async logout, the other is login_redirect. Unfortunately, from the EQL syslog side we don't see any errors related to this. It's my understanding, however, that when a login is initially attempted to the EQL, it hits the group ip or an alias'd IP sitting on a real nic. The group IP looks at all the interfaces on the EQL and decides, based on some algorithm, which EQL nic the session should connect to. It then sends the initiator that made the request a login_redirect, which I thought is basically a logout and reconnect pdu. It would say, for example, you're can't log into the group IP, however, you can log into this IP (a real nic) that it would prefer you be logged into. I'm thinking that the failed login is actually the result of that attempt to log into the group IP and it sending a login redirect pdu back to it. If the target was load balancing us it would: - Send a async logout pdu. - We then send a logout pdu. - When we get the logout response pdu we kill the tcp ip connection - We then create a new tcp connection - We then log in to the portal that was passed into iscsiadm/iscsid (the one in the DB that you see when you run iscsiadm -m node, which is probably what you call the group IP). For this process we send a login pdu. It then sends a login response pdu with the login redirect response. In this response we also get the new IP to log into. - We see that response and kill the tcp connection, and create a new tcp connection to the portal we are being redirected to. - We then log into the portal we were redirected to. We again do this by sending a login pdu. This time the login response pdu should be ok and we are done. We do not know which login pdu failed, right now. You would need a wireshark trace or iscsid debugging. iscsid could hit the Login authentication failed path for either of the login pdus sent. You should not see that message normally even when we are being redirected. We do the login redirect login when we initially log into the target (like when you do iscsiadm -m node -l or service iscsi start), and if you look in your logs you should not see a login failed message for that. If you do it might be a clue. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: iscsi diagnosis help
Mike Christie wrote: Hoot, Joseph wrote: more INLINE below... On Nov 17, 2009, at 7:27 PM, Mike Christie wrote: Pasi Kärkkäinen wrote: On Mon, Nov 16, 2009 at 09:39:00PM -0500, Hoot, Joseph wrote: On Nov 16, 2009, at 8:19 PM, Hoot, Joseph wrote: thanks. That helps. So I know that with the EqualLogic targets, there is a Group IP which, I believe, responds with an iscsi login_redirect. 1) Could the Login authentication failed message be the response because of a login redirect messages from the EQL redirect? and then my next question is more for curiosity sake: 2) Are there plans in the future to have more than one connection per session? and I guess in addition to that, would that mean multiple connections to a single volume over the same nic? Also Mike, I'm seeing one or two of these every 30-40 minutes if I slam our EqualLogic with roughly 7-15k IOPS (reads and writes) non stop on 3 volumes. In this type of scenario, would you expect to see timeouts like this once in awhile? If so, do you think increasing my NOOP timeouts would assist so we don't get these? maybe set it to 15 seconds instead of 10? Equallogic does active loadbalancing (redirects) during operation.. dunno about the errors though. Oh yeah, forgot about that. Thanks Pasi! Joseph, look in the EQL target logs for something about the EQL box doing load balancing. I think normally we handle the load balancing more gracefully, but we might be messing up. I think if EQL was load balancing in the open-iscsi logs we would see something about getting a async iscsi pdu from the target that asks us to logout. Then when we relogin the target would redirect us to the optimal path. There are two things that the EQL does, I believe-- one thing is async logout, the other is login_redirect. Unfortunately, from the EQL syslog side we don't see any errors related to this. It's my understanding, however, that when a login is initially attempted to the EQL, it hits the group ip or an alias'd IP sitting on a real nic. The group IP looks at all the interfaces on the EQL and decides, based on some algorithm, which EQL nic the session should connect to. It then sends the initiator that made the request a login_redirect, which I thought is basically a logout and reconnect pdu. It would say, for example, you're can't log into the group IP, however, you can log into this IP (a real nic) that it would prefer you be logged into. I'm thinking that the failed login is actually the result of that attempt to log into the group IP and it sending a login redirect pdu back to it. If the target was load balancing us it would: - Send a async logout pdu. - We then send a logout pdu. - When we get the logout response pdu we kill the tcp ip connection - We then create a new tcp connection - We then log in to the portal that was passed into iscsiadm/iscsid (the one in the DB that you see when you run iscsiadm -m node, which is probably what you call the group IP). For this process we send a login pdu. It then sends a login response pdu with the login redirect response. In this response we also get the new IP to log into. - We see that response and kill the tcp connection, and create a new tcp connection to the portal we are being redirected to. - We then log into the portal we were redirected to. We again do this by sending a login pdu. This time the login response pdu should be ok and we are done. Oh yeah, I meant to also say that this is pretty much the same process that happens we do the first login, and if we have to relogin because of a connection problem like the nop/ping timeout. The only difference in those cases is that we do not get the async logout and we do not do a logout by sending a logout pdu. We start at the killing tcp ip connection step. So even if we are not getting load balanced we would be in the same place in the open-iscsi code when we are getting the login failed errors. To get back on track solving why we get the nop timeouts then if we are not seeing load balancing messages or async logout messages, it could be the open-iscsi bug I mentioned in the other mail. If you can send the open-iscsi and kernel info I asked for in the other mail, we can start down that path. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: [RFC-PATCH] libiscsi dhcp handler
Rakesh Ranjan wrote: David Miller wrote: From: Rakesh Ranjan rak...@chelsio.com Date: Mon, 16 Nov 2009 18:41:49 +0530 Herein attached patches to support dhcp based provisioning for iSCSI offload capable cards. I have made dhcp code as generic as possible, please go through the code. Based on the feedback I will submit final version of these patches. You can't really add objects to the build before the patch that adds the source for that object. Hi david, Fixed patch attached. ping ... -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.
Re: Poor read performance with IET
Pryker, Test your environment with NULL I/O. Perform the read and write and check the performance which should attains around 100MB/s. If not so the problem would be around IET on your environment, Else the problem may be attained with your disk I/O or back-end driver. Thanks Gopala krishnan Varatharajan. On Tue, Nov 17, 2009 at 8:51 PM, pryker pry...@gmail.com wrote: I am currently running IET on a CentOS 5.4 server with the following kernel: Linux titan1 2.6.18-128.7.1.el5 #1 SMP Mon Aug 24 08:21:56 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux The server is a Dual quad core 2.8 GHz system with 16 GB ram. I am also using Coraid disk shelves via AoE for my block storage that I am offering up as a iscsi target. I am running v 0.4.17 of IET. I am getting very good write performance but lousy read performance. performing a simple sequential write to the iscsi target I get 94 megabytes per sec. With reads I am only getting 12.4 megabytes per sec. My ietd.conf looks like this: Target iqn.2009-11.net.storage:titan.diskshelf1.e1.2 Lun 1 Path=/dev/etherd/e1.2,Type=blockio Alias e1.2 MaxConnections 1 InitialR2T No ImmediateData Yes MaxRecvDataSegmentLength 262144 MaxXmitDataSegmentLength 262144 I have also made the following tweaks to tcp/ip: sysctl net.ipv4.tcp_rmem=100 100 100 sysctl net.ipv4.tcp_wmem=100 100 100 sysctl net.ipv4.tcp_tw_recycle=1 sysctl net.ipv4.tcp_tw_reuse=1 sysctl net.core.rmem_max=524287 sysctl net.core.wmem_max=524287 sysctl net.core.wmem_default=524287 sysctl net.core.optmem_max=524287 sysctl net.core.netdev_max_backlog=30 I am using Broadcom cards in the iscsi target server. I have enabled jumbo rames on them (MTU 9000). They are connected directly into a windows server and I am accessing the iscsi target with MS iscsi initiator. The NIC cards on the Windows server are also set to a MTU of 9000. There is no switch in between, they are directly connected into the Windows server. I also notice that load averages on the Linux box will get into the 7's and 8's when I try pushing the system by performing multiple transfers. Any feedback on what I might be missing here would be great! Thanks Phil -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.comopen-iscsi%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=. -- Regards Gopu. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=.