On Wed, Jul 24, 2013 at 05:15:16PM PDT, Mike Christie spake thusly: > Did you bring the target back up and if so did you do it with the same > target name?
Sorry for the delay in getting back, been travelling on business. But thanks very much for the reply! Yes, I did bring the target back up and with the same name. Although some of the LUNs have moved around as I rebuilt the machine to match its partner which the VMs RAID 1 it against. > What is your replacement/recovery timeout setting in /etc/iscsi/iscsid.conf? Looks like 120 but just in case, here's the entire contents: scsid.startup = /etc/rc.d/init.d/iscsid force-start node.startup = automatic node.leading_login = No node.session.timeo.replacement_timeout = 120 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_timeout = 5 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 30 node.session.err_timeo.tgt_reset_timeout = 30 node.session.initial_login_retry_max = 8 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.xmit_thread_priority = -20 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxXmitDataSegmentLength = 0 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.conn[0].iscsi.HeaderDigest = None node.session.nr_sessions = 1 node.session.iscsi.FastAbort = Yes See anything amiss? I now have around 8 processes stuck on this system. I'm going to have to reboot it this weekend to clear up the issue but I would really like to find out what is really going on and how to avoid it before taking such measures. > It sounds like the scsi scan IO is stuck on a target that disappeared > and never came back, or it is a Centos scsi layer bug. Could you send > the /var/log/messages. The entire file is rather large but here are some of the messages relevant to iscsi: Jul 4 15:18:44 cpu03 kernel: connection8:0: detected conn error (1020) Jul 4 15:18:45 cpu03 iscsid: Kernel reported iSCSI connection 8:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Jul 4 15:18:46 cpu03 kernel: connection6:0: detected conn error (1020) Jul 4 15:18:46 cpu03 kernel: connection7:0: detected conn error (1020) Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 6:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Jul 4 15:18:47 cpu03 iscsid: Kernel reported iSCSI connection 7:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3) Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:19:27 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) Jul 4 15:19:30 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) Jul 4 15:19:33 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) Jul 4 15:19:36 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) <skip many of these no route to host messages, happened while I was rebuilding the target with ip 10.0.1.11> Jul 4 15:18:47 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:50 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:18:51 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 4 15:20:45 cpu03 kernel: session8: session recovery timed out after 120 secs Jul 4 15:20:45 cpu03 iscsid: connect to 10.0.1.11:3260 failed (No route to host) Jul 4 15:20:47 cpu03 kernel: session6: session recovery timed out after 120 secs Jul 4 15:20:47 cpu03 kernel: session7: session recovery timed out after 120 secs Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 8 20:37:04 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) Jul 8 20:37:07 cpu03 iscsid: connect to 10.0.1.11:3260 failed (Connection refused) <skip lots of these connection refused messages) Jul 12 14:33:08 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:08 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:08 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:09 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:11 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:11 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:11 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:12 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:14 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:14 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:14 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:15 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:17 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:17 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:17 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:18 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:20 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:20 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:20 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:21 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:23 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:23 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:23 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:24 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:26 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:26 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:26 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:27 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:29 cpu03 kernel: connection8:0: detected conn error (1020) Jul 12 14:33:29 cpu03 kernel: connection6:0: detected conn error (1020) Jul 12 14:33:29 cpu03 kernel: connection7:0: detected conn error (1020) Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:33:30 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) <skip lots of these> Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:35:23 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:35:23 cpu03 iscsid: connection6:0 is operational after recovery (9 attempts) Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:35:26 cpu03 iscsid: conn 0 login rejected: initiator error - target not found (02/03) Jul 12 14:35:29 cpu03 iscsid: connection7:0 is operational after recovery (9 attempts) Jul 12 14:35:29 cpu03 iscsid: connection8:0 is operational after recovery (9 attempts) Jul 12 15:42:57 cpu03 kernel: scsi39 : iSCSI Initiator over TCP/IP Jul 12 15:42:57 cpu03 iscsid: Could not set session34 priority. READ/WRITE throughout and latency could be affected. Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:0: Attached scsi generic sg129 type 12 Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:1: Attached scsi generic sg130 type 0 Jul 12 15:42:58 cpu03 kernel: scsi 39:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:42:58 cpu03 kernel: sd 39:0:0:2: Attached scsi generic sg131 type 0 Jul 12 15:42:58 cpu03 iscsid: Connection34:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6b, portal: 10.0.1.11,3260] through [iface: default] is operational now Jul 12 15:43:12 cpu03 kernel: scsi40 : iSCSI Initiator over TCP/IP Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:0: Attached scsi generic sg132 type 12 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:1: Attached scsi generic sg133 type 0 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:2: Attached scsi generic sg134 type 0 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:3: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:3: Attached scsi generic sg135 type 0 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:4: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:4: Attached scsi generic sg136 type 0 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:5: Attached scsi generic sg137 type 0 Jul 12 15:43:12 cpu03 kernel: scsi 40:0:0:6: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:12 cpu03 kernel: sd 40:0:0:6: Attached scsi generic sg138 type 0 Jul 12 15:43:12 cpu03 iscsid: Could not set session35 priority. READ/WRITE throughout and latency could be affected. Jul 12 15:43:12 cpu03 iscsid: Connection35:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6e, portal: 10.0.1.11,3260] through [iface: default] is operational now Jul 12 15:43:16 cpu03 kernel: scsi41 : iSCSI Initiator over TCP/IP Jul 12 15:43:16 cpu03 iscsid: Could not set session36 priority. READ/WRITE throughout and latency could be affected. Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: RAID IET Controller 0001 PQ: 0 ANSI: 5 Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:0: Attached scsi generic sg139 type 12 Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:1: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:1: Attached scsi generic sg140 type 0 Jul 12 15:43:17 cpu03 kernel: scsi 41:0:0:2: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 12 15:43:17 cpu03 kernel: sd 41:0:0:2: Attached scsi generic sg141 type 0 Jul 12 15:43:17 cpu03 iscsid: Connection36:0 to [target: iqn.2012-04.com.edirectpublishing.disk06:6f, portal: 10.0.1.11,3260] through [iface: default] is operational now Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Unit Not Ready Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY(16) failed Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] READ CAPACITY failed Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Test WP failed, assume Write Enabled Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Asking for cache data failed Jul 14 23:10:39 cpu03 kernel: sd 15:0:0:4: [sdab] Assuming drive cache: write through Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Unit Not Ready Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY(16) failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] READ CAPACITY failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Test WP failed, assume Write Enabled Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Asking for cache data failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:5: [sdag] Assuming drive cache: write through Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Unit Not Ready Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY(16) failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] READ CAPACITY failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Test WP failed, assume Write Enabled Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Asking for cache data failed Jul 14 23:10:39 cpu03 kernel: sd 16:0:0:7: [sdai] Assuming drive cache: write through Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Unit Not Ready Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Sense Key : Illegal Request [current] Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Add. Sense: Logical unit not supported Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] READ CAPACITY(16) failed Jul 14 23:10:39 cpu03 kernel: sd 18:0:0:2: [sdan] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE <skip lots of these> Jul 24 16:51:30 cpu03 kernel: scsi 24:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 24 16:51:30 cpu03 kernel: sd 24:0:0:5: Attached scsi generic sg142 type 0 Jul 24 16:51:30 cpu03 kernel: scsi 29:0:0:5: Direct-Access IET VIRTUAL-DISK 0001 PQ: 0 ANSI: 5 Jul 24 16:51:30 cpu03 kernel: sd 29:0:0:5: Attached scsi generic sg143 type 0 Thanks for any insight you can provide! -- Tracy Reed
pgpTyVBc5sJ2Y.pgp
Description: PGP signature