On Wed, Jul 24, 2013 at 05:15:16PM PDT, Mike Christie spake thusly:
> Did you bring the target back up and if so did you do it with the same
> target name?

Sorry for the delay in getting back, been travelling on business. But thanks
very much for the reply!

Yes, I did bring the target back up and with the same name. Although some of
the LUNs have moved around as I rebuilt the machine to match its partner which
the VMs RAID 1 it against.

> What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf?

Looks like 120 but just in case, here's the entire contents:

scsid.startup = /etc/rc.d/init.d/iscsid force-start
node.startup = automatic
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes

See anything amiss? I now have around 8 processes stuck on this system. I'm
going to have to reboot it this weekend to clear up the issue but I would
really like to find out what is really going on and how to avoid it before
taking such measures.

> It sounds like the scsi scan IO is stuck on a target that disappeared
> and never came back, or it is a Centos scsi layer bug. Could you send
> the /var/log/messages.

The entire file is rather large but here are some of the messages relevant to
iscsi:

Jul  4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020)
Jul  4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error (1020 
- ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul  4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020)
Jul  4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020)
Jul  4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error (1020 
- ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul  4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error (1020 
- ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
Jul  4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
Jul  4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
Jul  4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
Jul  4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
Jul  4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
<skip many of these no route to host messages, happened while I was rebuilding 
the target with ip 10.0.1.11>
Jul  4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 
secs
Jul  4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to 
host)
Jul  4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 
secs
Jul  4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 
secs
Jul  8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
Jul  8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection 
refused)
<skip lots of these connection refused messages)
Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020)
Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
<skip lots of these>
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 
attempts)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target 
not found (02/03)
Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 
attempts)
Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 
attempts)
Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP
Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE 
throughout and latency could be affected.
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID              IET      
Controller       0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type 12
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0
Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0
Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: 
iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through 
[iface: default] is operational now
Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID              IET      
Controller       0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type 12
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0
Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0
Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE 
throughout and latency could be affected.
Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: 
iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through 
[iface: default] is operational now
Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP
Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE 
throughout and latency could be affected.
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID              IET      
Controller       0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type 12
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0
Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0
Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: 
iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through 
[iface: default] is operational now
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume Write 
Enabled
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write 
through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume Write 
Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write 
through
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume Write 
Enabled
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed
Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write 
through
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan]  Sense Key : Illegal Request 
[current] 
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan]  Add. Sense: Logical unit not 
supported
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed
Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan]  Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
<skip lots of these>
Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0
Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access     IET      
VIRTUAL-DISK     0001 PQ: 0 ANSI: 5
Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0

Thanks for any insight you can provide!

-- 
Tracy Reed

Attachment: pgpTyVBc5sJ2Y.pgp
Description: PGP signature

Reply via email to