On 11/10/09 11:39 AM, "Mike Christie" <micha...@cs.wisc.edu> wrote:
>
> What version of open-iscsi were you using and what kernel, and were you
> using the iscsi kernel modules with open-iscsi.org tarball or from the
> kernel?
iscsi-initiator-utils-6.2.0.871-0.10.el5
kernel-2.6.18-164.2.1.el5
RedHat RPMs
>
>
> It looks like we are sending more IO than the target can handle. In one
> of those cases it took more than 30 or 60 seconds (depending on your
> timeout value).
>
> What is the value of
>
> cat /sys/block/sdXYZ/device/timeout
>
> ?
>
> If it is 30 or 60 could you increase it to 360? After you login to the
> target do
>
> echo 360 > /sys/block/sdXYZ/device/timeout
I've tried setting this, but it appears to have no effect - it was 60, and I
increased to 360.
>
> And what is the value of:
>
> iscsiadm -m node -T your_target | grep node.session.cmds_max
>
> If that is 128, then could you decrease that to 32 or 16?
>
> Run
>
> iscsiadm -m node -T your_target -u
> iscsiadm -m node -T your_target -o update -n node.session.cmds_max -v 32
> iscsiad-m node -T your_target -l
I've tried setting to both 16 and 32, but it behaves about the same.
>
>
> And if those prevent the io errors then could you do
>
> echo noop > /sys/block/sdXYZ/queue/scheduler
>
> to see if performance increases with a difference scheduler.
I really think I'm back to the duplicate ACK problem - see the attached
packet dump - at one point there's 30 duplicate ACKs... Interestingly, the
storage has "worked" for the past week - I'm using it as D2D backup. This
morning (about 7 days later), it's giving all these duplicate ACKs.
I'm currently running into messages such as:
Nov 19 09:46:58 backup iscsid: Kernel reported iSCSI connection 2:0 error
(1011) state (3)
Nov 19 09:47:00 backup kernel: session2: target reset succeeded
Nov 19 09:47:01 backup iscsid: connection2:0 is operational after recovery
(1 attempts)
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 8856
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:80.
Nov 19 09:47:10 backup kernel: sd 9:0:0:0: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdf, sector 74424
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
8845240
Nov 19 09:47:10 backup kernel: device-mapper: multipath: Failing path 8:192.
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: SCSI error: return code =
0x000e0000
Nov 19 09:47:10 backup kernel: end_request: I/O error, dev sdm, sector
62915456
Nov 19 09:47:10 backup kernel: sd 9:0:0:2: timing out command, waited 300s
Nov 19 09:47:10 backup multipathd: /sbin/mpath_prio_alua exitted with 1
Nov 19 09:47:10 backup multipathd: error calling out /sbin/mpath_prio_alua
/dev/sdm
Nov 19 09:47:10 backup multipathd: 3600d0230ffffffff061d4479bfb83902: switch
to path group #2
This is also interesting:
Nov 18 01:48:30 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:16:29 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 20:16:34 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:32:09 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:43:05 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:48:08 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 3
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 2
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 1
Nov 18 20:53:36 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 0
Nov 18 20:53:41 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 1
Nov 18 20:59:09 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 2
Nov 18 21:04:37 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 3
Nov 18 21:10:05 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 4
Nov 18 21:15:33 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 5
Nov 18 21:20:35 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 6
Nov 18 21:26:03 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 7
Nov 18 21:31:06 backup multipathd: 3600d0230ffffffff061d4479bfb83902:
remaining active paths: 8
Matthew
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=.