Re: open-iscsi with Promise M500i dropping session / Nop-out timedout
On Wed, May 28, 2008 at 03:34:37PM +0300, Pasi Kärkkäinen wrote: Hello list! Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet initiator) to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it (then again I was expecting it.. I hate this Promise array). /var/log/messages: May 28 15:14:16 server1 multipathd: path checkers start up May 28 15:15:39 server1 iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). Dropping session. May 28 15:15:42 server1 iscsid: connection14:0 is operational after recovery (2 attempts) May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057296 May 28 15:19:21 server1 kernel: device-mapper: multipath: Failing path 8:48. May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057552 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057560 May 28 15:19:21 server1 multipathd: sdd: readsector0 checker reports path is down May 28 15:19:21 server1 multipathd: checker failed path 8:48 in map promise_test1 May 28 15:19:21 server1 multipathd: promise_test1: remaining active paths: 1 May 28 15:19:21 server1 iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). Dropping session. May 28 15:19:25 server1 iscsid: connection14:0 is operational after recovery (2 attempts) May 28 15:19:26 server1 multipathd: sdd: readsector0 checker reports path is up May 28 15:19:26 server1 multipathd: 8:48: reinstated May 28 15:19:26 server1 multipathd: promise_test1: remaining active paths: 2 May 28 15:19:26 server1 multipathd: promise_test1: switch to path group #1 $ iscsiadm -m node --targetname name | grep timeo node.session.timeo.replacement_timeout = 15 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.active_timeout = 5 node.conn[0].timeo.idle_timeout = 60 node.conn[0].timeo.ping_timeout = 5 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 node.session.timeo.replacement_timeout = 15 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.active_timeout = 5 node.conn[0].timeo.idle_timeout = 60 node.conn[0].timeo.ping_timeout = 5 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 Basicly those Nop-out timedout errors keep showing up all the time when there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null running You can expand the timeout to a higher value? 30 seconds ? Also you might want to limit the node.session.queue_depth to a lower value as well. IO rates seem to go down every 20 seconds or so and stay stalled (at 0) for 5 seconds or so.. weird. That could be due to the NOP not getting its response and stalling the session until it receives the response. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi with Promise M500i dropping session / Nop-out timedout
On Wed, May 28, 2008 at 07:10:08PM +0300, Pasi Kärkkäinen wrote: Basicly those Nop-out timedout errors keep showing up all the time when there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null running You can expand the timeout to a higher value? 30 seconds ? Also you might want to limit the node.session.queue_depth to a lower value as well. I tried this.. doesn't seem to help much. I still get the same errors. I'll try limiting queue depth too.. default queue depth is 32. I ran: echo 8 /sys/block/sdc/device/queue_depth echo 8 /sys/block/sdd/device/queue_depth and re-ran the dd test. Same problem. Log entries: iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). Dropping session. iscsid: connection14:0 is operational after recovery (2 attempts) then again it seems I get these errors less often now.. (with a smaller queue depth). So it seems to help.. I'm not totally sure about this, but it could be that sometimes when I can see the io stall (with iostat) I also get that Nop-out timedout.. and sometimes not. With a smaller queue depth it just stalls, but with a bigger queue depth it also drops the session (more often). Results from the dd test with noop_out_timeout of 30 seconds and queue depth of 32: iscsid: Nop-out timedout after 30 seconds on connection 18:0 state (3). Dropping session. iscsid: connection18:0 is operational after recovery (2 attempts) kernel: sd 20:0:0:0: SCSI error: return code = 0x0002 kernel: end_request: I/O error, dev sdd, sector 13510024 kernel: device-mapper: multipath: Failing path 8:48. multipathd: 8:48: mark as failed multipathd: promise_test1: remaining active paths: 1 iscsid: Nop-out timedout after 30 seconds on connection 18:0 state (3). Dropping session. iscsid: connection18:0 is operational after recovery (2 attempts) multipathd: sdd: readsector0 checker reports path is up multipathd: 8:48: reinstated multipathd: promise_test1: remaining active paths: 2 multipathd: promise_test1: switch to path group #1 So hmm.. it looks like lowering the queue depth helps with the session drops while increasing the noop_out_timeout doesn't make much difference.. Or actually, it could be that increasing the noop_out_timeout makes the stalls happen less often.. hmm:) Thanks for the help/comments! -- Pasi --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi with Promise M500i dropping session / Nop-out timedout
Pasi Kärkkäinen wrote: Hello list! Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet initiator) to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it You are using the open-iscsi code that comes with Centos right? (then again I was expecting it.. I hate this Promise array). /var/log/messages: May 28 15:14:16 server1 multipathd: path checkers start up May 28 15:15:39 server1 iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). Dropping session. May 28 15:15:42 server1 iscsid: connection14:0 is operational after recovery (2 attempts) May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057296 May 28 15:19:21 server1 kernel: device-mapper: multipath: Failing path 8:48. May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057552 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 0x0002 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 190057560 May 28 15:19:21 server1 multipathd: sdd: readsector0 checker reports path is down May 28 15:19:21 server1 multipathd: checker failed path 8:48 in map promise_test1 May 28 15:19:21 server1 multipathd: promise_test1: remaining active paths: 1 May 28 15:19:21 server1 iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). Dropping session. May 28 15:19:25 server1 iscsid: connection14:0 is operational after recovery (2 attempts) May 28 15:19:26 server1 multipathd: sdd: readsector0 checker reports path is up May 28 15:19:26 server1 multipathd: 8:48: reinstated May 28 15:19:26 server1 multipathd: promise_test1: remaining active paths: 2 May 28 15:19:26 server1 multipathd: promise_test1: switch to path group #1 $ iscsiadm -m node --targetname name | grep timeo node.session.timeo.replacement_timeout = 15 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.active_timeout = 5 node.conn[0].timeo.idle_timeout = 60 node.conn[0].timeo.ping_timeout = 5 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 node.session.timeo.replacement_timeout = 15 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.auth_timeout = 45 node.conn[0].timeo.active_timeout = 5 node.conn[0].timeo.idle_timeout = 60 node.conn[0].timeo.ping_timeout = 5 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 10 Basicly those Nop-out timedout errors keep showing up all the time when there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null running IO rates seem to go down every 20 seconds or so and stay stalled (at 0) for 5 seconds or so.. weird. Initiator is the default RHEL/CentOS 5.1 version. Most probably the problem is in the Promise target because I had a lot of issues with it earlier too.. It took some time before I got it to work ok with CentOS 4.6. With CentOS 4.6 (sfnet initiator) I was using this in iscsid.conf: ConnFailTimeout=5 PingTimeout=10 and also: echo 60 /sys/block/sdc/device/timeout echo 60 /sys/block/sdd/device/timeout But I remember seeing errors / failing paths in the logs then too.. Anyway, is there anything I can do about these errors, or should I just let multipath do its job :) You can turn nops off open-iscsi node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 sfnet PingTimeout=0 ActiveTimeout=0 IdleTimeout=0 But I think the problem with promise was that it needed new firmware or something right? If it did not work with sfnet and open-iscsi then I think that was the problem. If it just did not work on open-iscsi then it may have been something else. Did you search the list by any chance? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: open-iscsi with Promise M500i dropping session / Nop-out timedout
On Wed, May 28, 2008 at 01:17:17PM -0500, Mike Christie wrote: Pasi Kärkkäinen wrote: Hello list! Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet initiator) to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it You are using the open-iscsi code that comes with Centos right? Yep, the default open-iscsi that comes with CentOS 5.1 (and the latest updates installed). You can turn nops off open-iscsi node.conn[0].timeo.noop_out_interval = 0 node.conn[0].timeo.noop_out_timeout = 0 Does turning nops off have any side effects? But I think the problem with promise was that it needed new firmware or something right? If it did not work with sfnet and open-iscsi then I think that was the problem. If it just did not work on open-iscsi then it may have been something else. Did you search the list by any chance? Yep, I was searching.. I think same kind of problem with Infotrend target was fixed with a firmware upgrade. I'm running the latest firmware on that Promise.. so that doesn't help in this case. And yep, I had/have issues with both sfnet (CentOS 4) and open-iscsi (CentOS 5) when I use this Promise target.. Here's some other recent thread about problems with the same target: http://www.mail-archive.com/open-iscsi@googlegroups.com/msg00692.html -- Pasi --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---