Thanks for the patch Mike. Below is the output from a failure when running with the patch. Any thoughts?
[<f8bf0876>] iscsi_conn_failure+0x10/0x69 [libiscsi] [<f9bf202d>] iscsi_eh_abort+0x2f1/0x406 [libiscsi] [<f885d378>] __scsi_try_to_abort_cmd+0x19/0x1a [scsi_mod] [<f885e85d>] scsi_error_handler+0x24d/0x422 [scsi_mod] [<c041f7ea>] complete+0x2b/0x3d [<f885e610>] scsi_error_handler+0x0/0x422 [scsi_mod] [<c0435f65>] kthread+0xc0/0xeb [<c0435ea5>] kthread+0x0/0xeb [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================= connection1:0 detected conn error (1011) [<f8bf0876>] iscsi_conn_failure+0x10/0x69 [libiscsi] [<f8bf22fc>] iscsi_eh_target_reset+0xbb/0x218 [libiscsi] [<c0605967>] _spin_lock_bh+0x8/0x18 [<f8bf0f78>] iscsi_eh_device_reset+0x1c5/0x1cf [libiscsi] [<c054a6dd>] get_device+0xe/0x14 [<f885d764>] scsi_try_host_reset+0x3a/0x99 [scsi_mod] [<f885e0e3>] scsi_eh_ready_devs+0x302/0x3e2 [scsi_mod] [<f885e8dd>] scsi_error_handler+0x2cd/0x422 [scsi_mod] [<c041f7ea>] complete+0x2b/0x3d [<f885e610>] scsi_error_handler+0x0/0x422 [scsi_mod] [<c0435f65>] kthread+0xc0/0xeb [<c0435ea5>] kthread+0x0/0xeb [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================= session1: session recovery timed out after 400 secs sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: scsi: Device offlined - no ready after error recovery sd 0:0:0:0: SCSI error: return code = 0x00020000 end_request: I/O error, dev sda, sector 14283149 On Jul 13, 10:34 pm, Mike Christie <micha...@cs.wisc.edu> wrote: > Could you run with the attached patch? It just prints out a little more > info. When we get the conn error, it will print out a message if it is > due to the target dropping the connection and it will print out stack > trace so we can see exactly what piece of code is throwing the error. > > On 07/13/2010 09:33 PM, Sean S wrote: > > > Nothing else in the log from iscsid. No mention of a failed reconnect, > > although the only log I'm really able to access post failure is dmesg. > > Since I'm running a root iscsi, I couldn't get to /var/log/messages > > which maybe was a little more verbose? What sort of network problems > > Yeah, by default the iscsid messages go there. iscsid should be spitting > out a cannot connect $some_error_value_or_string that would help tell us > why we cannot reach the target anymore. > > > might cause this? The "network" in this situation is a simple gigE > > switch with about 3 or 4 systems on it. The target and initiator are > > on the same subnet, nothing fancy. Is there some additional debug > > you'd recommend turning on? Any tips or tricks when running with a > > root iscsi drive? > > Not that I can think of at the iscsi layer. > > > > > Curiously, if I physically disconnect the ethernet from the initiator > > while running, all I/O access is correctly paused without returning I/ > > O errors. If I then reconnect before the 400s is up things go back to > > normal. I don't however see the "detected conn error (1011)" message > > in this situation however. Not sure if that really means anything. > > You should see the conn error 1011 message if > > 1. you have nops on and they timeout and that causes us to log that error. > > 2. the network layer figures out there is a problem and notifies us. It > is possible that you pull a cable and plug it back in before the network > throws an error. > > 3. iscsi driver or protocol error. In this case we should relogin quickly. > > trace-conn-error.patch > 1KViewDownload -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.