On Mar 19, 10:56 am, Mike Christie <micha...@cs.wisc.edu> wrote:
> dave wrote:
> > Can anyone tell me why the SCSI layer says the device is not ready
> > when iscsiadm reports it is logged in?
>
> > Can I manually online the device? How should I recover from here?
>
> You can do
>
> echo running > /sys/block/sdX/device/state
>
> but you might not want to because the device may not be back.

A disk in the Sun iscsi target server died. When a disk fails in the
server, the iscsi target pauses all read/writes for about 1-2 minutes
until it marks the disk as faulted, then continues normal operation
using the rest of the RAID pool. I had tested this before and dm-
multipath with iscsi seemed to work just fine when the iscsi target
paused and eventually resumed, so I was just a little surprised this
time. Usually I see timing closer to a minute between conn error and
recovery... what are the reconnect/recovery timers of open-iscsi for
this scenario?

>
> > Is this a known problem, and has it been fixed in newer open-iscsi
> > versions?
>
> Are you using a older version of the sun target?

I am. I am running OpenSoalris SXCE build 93, which is about 8 months
old. I'll be upgrading this soon.

>
>
>
> > Mar 18 18:21:33 eq1-vz2 kernel:  connection1:0: detected conn error
> > (1011)
> > Mar 18 18:21:36 eq1-vz2 kernel:  session1: host reset succeeded
>
> When we log back in we tell scsi-ml that we are ok.

At what level does the connection receive an error and reset (can't
log in to target, read/write errors, etc), and what functionality is
needed to be considered ok? If the device wasn't really ready to be
used again, shouldn't iscsi know this and attempt another recovery?
I'm not particularly well versed in iscsi protocol.

>
> > Mar 18 18:22:16 eq1-vz2 kernel: sd 6:0:0:0: scsi: Device offlined -
> > not ready after error recovery
>
> scsi-ml will send a Test unit ready (TUR) command to check that the
> device is ready to go. The TUR seems to be failing and so the scsi layer
> sets the device offline.

Is there only one TUR sent? I would have assumed a more robust
recovery procedure here.

>
> I think there was some target issue and was fixed in newer ones.
>
> If you can easily replicate this then you should take wireshark/ethereal
> trace and send the trace here so we can see why the TUR failed and make
> sure it is not our fault before you go to the trouble of updating.

I'll see what I can do to get a wire trace next time I have an
opportunity to intentionally hiccup the iscsi target.

Thanks, Mike.

--
Dave
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to