Re: open-iscsi with Promise M500i dropping session / Nop-out timedout

2008-05-28 Thread Konrad Rzeszutek

On Wed, May 28, 2008 at 03:34:37PM +0300, Pasi Kärkkäinen wrote:
 
 Hello list!
 
 Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet initiator) 
 to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it
 (then again I was expecting it.. I hate this Promise array).
 
 /var/log/messages:
 
 May 28 15:14:16 server1 multipathd: path checkers start up
 May 28 15:15:39 server1 iscsid: Nop-out timedout after 10 seconds on 
 connection 14:0 state (3). Dropping session.
 May 28 15:15:42 server1 iscsid: connection14:0 is operational after recovery 
 (2 attempts)
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057296
 May 28 15:19:21 server1 kernel: device-mapper: multipath: Failing path 8:48.
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057552
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057560
 May 28 15:19:21 server1 multipathd: sdd: readsector0 checker reports path is 
 down
 May 28 15:19:21 server1 multipathd: checker failed path 8:48 in map 
 promise_test1
 May 28 15:19:21 server1 multipathd: promise_test1: remaining active paths: 1
 May 28 15:19:21 server1 iscsid: Nop-out timedout after 10 seconds on 
 connection 14:0 state (3). Dropping session.
 May 28 15:19:25 server1 iscsid: connection14:0 is operational after recovery 
 (2 attempts)
 May 28 15:19:26 server1 multipathd: sdd: readsector0 checker reports path is 
 up
 May 28 15:19:26 server1 multipathd: 8:48: reinstated
 May 28 15:19:26 server1 multipathd: promise_test1: remaining active paths: 2
 May 28 15:19:26 server1 multipathd: promise_test1: switch to path group #1
 
 $ iscsiadm -m node --targetname name | grep timeo 
 node.session.timeo.replacement_timeout = 15
 node.session.err_timeo.abort_timeout = 10
 node.session.err_timeo.reset_timeout = 30
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.active_timeout = 5
 node.conn[0].timeo.idle_timeout = 60
 node.conn[0].timeo.ping_timeout = 5
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 10
 node.session.timeo.replacement_timeout = 15
 node.session.err_timeo.abort_timeout = 10
 node.session.err_timeo.reset_timeout = 30
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.active_timeout = 5
 node.conn[0].timeo.idle_timeout = 60
 node.conn[0].timeo.ping_timeout = 5
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 10
 
 Basicly those Nop-out timedout errors keep showing up all the time when
 there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null running 

You can expand the timeout to a higher value? 30 seconds ? Also you might
want to limit the node.session.queue_depth to a lower value as well.

 IO rates seem to go down every 20 seconds or so and stay stalled (at 0) for 
 5 seconds or so.. weird.

That could be due to the NOP not getting its response and stalling the session
until it receives the response.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi with Promise M500i dropping session / Nop-out timedout

2008-05-28 Thread Pasi Kärkkäinen

On Wed, May 28, 2008 at 07:10:08PM +0300, Pasi Kärkkäinen wrote:
 
   
   Basicly those Nop-out timedout errors keep showing up all the time when
   there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null 
   running 
  
  You can expand the timeout to a higher value? 30 seconds ? Also you might
  want to limit the node.session.queue_depth to a lower value as well.
  
 
 I tried this.. doesn't seem to help much. I still get the same errors. 
 
 I'll try limiting queue depth too.. 
 

default queue depth is 32. 

I ran:
echo 8  /sys/block/sdc/device/queue_depth
echo 8  /sys/block/sdd/device/queue_depth

and re-ran the dd test. Same problem. Log entries:

iscsid: Nop-out timedout after 10 seconds on connection 14:0 state (3). 
Dropping session.
iscsid: connection14:0 is operational after recovery (2 attempts)

then again it seems I get these errors less often now.. (with a smaller queue 
depth).
So it seems to help.. 

I'm not totally sure about this, but it could be that sometimes when I can see 
the io stall (with iostat) I also get that Nop-out timedout.. and sometimes 
not. 

With a smaller queue depth it just stalls, but with a bigger queue depth it 
also 
drops the session (more often).


Results from the dd test with noop_out_timeout of 30 seconds and queue depth 
of 32:

iscsid: Nop-out timedout after 30 seconds on connection 18:0 state (3). 
Dropping session.
iscsid: connection18:0 is operational after recovery (2 attempts)
kernel: sd 20:0:0:0: SCSI error: return code = 0x0002
kernel: end_request: I/O error, dev sdd, sector 13510024
kernel: device-mapper: multipath: Failing path 8:48.
multipathd: 8:48: mark as failed
multipathd: promise_test1: remaining active paths: 1
iscsid: Nop-out timedout after 30 seconds on connection 18:0 state (3). 
Dropping session.
iscsid: connection18:0 is operational after recovery (2 attempts)
multipathd: sdd: readsector0 checker reports path is up
multipathd: 8:48: reinstated
multipathd: promise_test1: remaining active paths: 2
multipathd: promise_test1: switch to path group #1

So hmm.. it looks like lowering the queue depth helps with the session drops 
while increasing 
the noop_out_timeout doesn't make much difference.. 

Or actually, it could be that increasing the noop_out_timeout makes the
stalls happen less often.. hmm:)

Thanks for the help/comments!

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi with Promise M500i dropping session / Nop-out timedout

2008-05-28 Thread Mike Christie

Pasi Kärkkäinen wrote:
 Hello list!
 
 Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet initiator) 
 to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it

You are using the open-iscsi code that comes with Centos right?

 (then again I was expecting it.. I hate this Promise array).
 
 /var/log/messages:
 
 May 28 15:14:16 server1 multipathd: path checkers start up
 May 28 15:15:39 server1 iscsid: Nop-out timedout after 10 seconds on 
 connection 14:0 state (3). Dropping session.
 May 28 15:15:42 server1 iscsid: connection14:0 is operational after recovery 
 (2 attempts)
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057296
 May 28 15:19:21 server1 kernel: device-mapper: multipath: Failing path 8:48.
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057552
 May 28 15:19:21 server1 kernel: sd 16:0:0:0: SCSI error: return code = 
 0x0002
 May 28 15:19:21 server1 kernel: end_request: I/O error, dev sdd, sector 
 190057560
 May 28 15:19:21 server1 multipathd: sdd: readsector0 checker reports path is 
 down
 May 28 15:19:21 server1 multipathd: checker failed path 8:48 in map 
 promise_test1
 May 28 15:19:21 server1 multipathd: promise_test1: remaining active paths: 1
 May 28 15:19:21 server1 iscsid: Nop-out timedout after 10 seconds on 
 connection 14:0 state (3). Dropping session.
 May 28 15:19:25 server1 iscsid: connection14:0 is operational after recovery 
 (2 attempts)
 May 28 15:19:26 server1 multipathd: sdd: readsector0 checker reports path is 
 up
 May 28 15:19:26 server1 multipathd: 8:48: reinstated
 May 28 15:19:26 server1 multipathd: promise_test1: remaining active paths: 2
 May 28 15:19:26 server1 multipathd: promise_test1: switch to path group #1
 
 $ iscsiadm -m node --targetname name | grep timeo 
 node.session.timeo.replacement_timeout = 15
 node.session.err_timeo.abort_timeout = 10
 node.session.err_timeo.reset_timeout = 30
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.active_timeout = 5
 node.conn[0].timeo.idle_timeout = 60
 node.conn[0].timeo.ping_timeout = 5
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 10
 node.session.timeo.replacement_timeout = 15
 node.session.err_timeo.abort_timeout = 10
 node.session.err_timeo.reset_timeout = 30
 node.conn[0].timeo.logout_timeout = 15
 node.conn[0].timeo.login_timeout = 15
 node.conn[0].timeo.auth_timeout = 45
 node.conn[0].timeo.active_timeout = 5
 node.conn[0].timeo.idle_timeout = 60
 node.conn[0].timeo.ping_timeout = 5
 node.conn[0].timeo.noop_out_interval = 5
 node.conn[0].timeo.noop_out_timeout = 10
 
 Basicly those Nop-out timedout errors keep showing up all the time when
 there is IO going on.. and if I have dd if=/dev/mpath of=/dev/null running 
 IO rates seem to go down every 20 seconds or so and stay stalled (at 0) for 
 5 seconds or so.. weird.
 
 Initiator is the default RHEL/CentOS 5.1 version.
 
 Most probably the problem is in the Promise target because I had a lot of 
 issues
 with it earlier too.. It took some time before I got it to work ok with
 CentOS 4.6. 
 
 With CentOS 4.6 (sfnet initiator) I was using this in iscsid.conf:
 
 ConnFailTimeout=5
 PingTimeout=10
 
 and also:
 echo 60  /sys/block/sdc/device/timeout
 echo 60  /sys/block/sdd/device/timeout
 
 But I remember seeing errors / failing paths in the logs then too.. 
 
 Anyway, is there anything I can do about these errors, or should I just let
 multipath do its job :)
 

You can turn nops off

open-iscsi
  node.conn[0].timeo.noop_out_interval = 0
  node.conn[0].timeo.noop_out_timeout = 0


sfnet
PingTimeout=0
ActiveTimeout=0
IdleTimeout=0

But I think the problem with promise was that it needed new firmware or 
something right? If it did not work with sfnet and open-iscsi then I 
think that was the problem. If it just did not work on open-iscsi then 
it may have been something else. Did you search the list by any chance?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: open-iscsi with Promise M500i dropping session / Nop-out timedout

2008-05-28 Thread Pasi Kärkkäinen

On Wed, May 28, 2008 at 01:17:17PM -0500, Mike Christie wrote:
 
 Pasi Kärkkäinen wrote:
  Hello list!
  
  Unfortunately I had to upgrade a server running CentOS 4.6 (sfnet 
  initiator) 
  to CentOS 5.1 (open-iscsi initiator) and now I have some problems with it
 
 You are using the open-iscsi code that comes with Centos right?
 

Yep, the default open-iscsi that comes with CentOS 5.1 (and the latest
updates installed).

 
 You can turn nops off
 
 open-iscsi
   node.conn[0].timeo.noop_out_interval = 0
   node.conn[0].timeo.noop_out_timeout = 0
 

Does turning nops off have any side effects? 

 But I think the problem with promise was that it needed new firmware or 
 something right? If it did not work with sfnet and open-iscsi then I 
 think that was the problem. If it just did not work on open-iscsi then 
 it may have been something else. Did you search the list by any chance?


Yep, I was searching.. I think same kind of problem with Infotrend target
was fixed with a firmware upgrade. 

I'm running the latest firmware on that Promise.. so that doesn't help in
this case. 

And yep, I had/have issues with both sfnet (CentOS 4) and open-iscsi (CentOS 5)
when I use this Promise target.. 

Here's some other recent thread about problems with the same target:
http://www.mail-archive.com/open-iscsi@googlegroups.com/msg00692.html

-- Pasi

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---