On 08/06/2010 09:57 AM, Hannes Reinecke wrote:
Mike Christie wrote:
ccing Hannes from suse, because this looks like a SLES only bug.

Hey Hannes,

The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.


On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
I've copied both the messages file from the host goncalog140 and the
patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
files in the link below:

http://promisc.org/iscsi/


It looks like this chunk from libiscsi.c:iscsi_queuecommand:

         case ISCSI_STATE_FAILED:
             reason = FAILURE_SESSION_FAILED;
             sc->result = DID_TRANSPORT_DISRUPTED<<  16;
             break;

is causing IO errors.

You want to use something like DID_IMM_RETRY because it can be a long
time between the time the kernel marks the state as ISCSI_STATE_FAILED
until we start recovery and properly get all the device queues blocked,
so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
Yeah, I noticed.
But the problem is that multipathing will stall during this time,
ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
will circumvent this and we can failover immediately.


It should stall, It works like FC and the fast io fail tmo. Users need to set the iscsi replacement/recovery timeout like they would FC's fast io fail tmo. They should set it to 3 or 5 secs or lower if they want really fast failovers.

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to