On 11/10/09 11:39 AM, "Mike Christie" <micha...@cs.wisc.edu> wrote:
> 
> What version of open-iscsi were you using and what kernel, and were you
> using the iscsi kernel modules with open-iscsi.org tarball or from the
> kernel?

iscsi-initiator-utils-6.2.0.871-0.10.el5
kernel-2.6.18-164.2.1.el5

RedHat RPMs

> 
> 
> It looks like we are sending more IO than the target can handle. In one
> of those cases it took more than 30 or 60 seconds (depending on your
> timeout value).
> 
> What is the value of
> 
> cat /sys/block/sdXYZ/device/timeout
> 
> ?
> 
> If it is 30 or 60 could you increase it to 360? After you login to the
> target do
> 
> echo 360 > /sys/block/sdXYZ/device/timeout

I've tried setting this, but it appears to have no effect - it was 60, and I
increased to 360.

> 
> And what is the value of:
> 
> iscsiadm -m node -T your_target | grep node.session.cmds_max
> 
> If that is 128, then could you decrease that to 32 or 16?
> 
> Run
> 
> iscsiadm -m node -T your_target -u
> iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32
> iscsiad-m node -T your_target -l

I've tried setting to both 16 and 32, but it behaves about the same.

> 
> 
> And if those prevent the io errors then could you do
> 
> echo noop > /sys/block/sdXYZ/queue/scheduler
> 
> to see if performance increases with a difference scheduler.


I really think I'm back to the duplicate ACK problem - see the attached
packet dump - at one point  there's 30 duplicate ACKs... Interestingly, the
storage has "worked" for the past week - I'm using it as  D2D backup.  This
morning (about 7 days later), it's giving all these duplicate ACKs.

I'm currently running into messages such as:

Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error
(1011) state (3)
Nov 19 09:47:00 backup kernel:  session2: target reset succeeded
Nov 19 09:47:01 backup iscsid: connection2:0 is operational after recovery
(1 attempts)
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 8856
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:80.
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 74424
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
8845240
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:192.
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
62915456
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: timing out command, waited 300s
Nov 19 09:47:10 backup multipathd: /sbin/mpath_prio_alua exitted with 1
Nov 19 09:47:10 backup multipathd: error calling out /sbin/mpath_prio_alua
/dev/sdm 
Nov 19 09:47:10 backup multipathd: 3600d0230ffffffff061d4479bfb83902: switch
to path group #2 

This is also interesting:

Nov 18 01:48:30 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 20:16:34 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:32:09 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:43:05 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:48:08 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 3
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 2
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 1
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 0
Nov 18 20:53:41 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 1
Nov 18 20:59:09 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 2
Nov 18 21:04:37 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 3
Nov 18 21:10:05 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 21:15:33 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 21:20:35 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 21:26:03 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 21:31:06 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8

Matthew


--

You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=.


Reply via email to