On 08/06/2010 09:57 AM, Hannes Reinecke wrote:
Mike Christie wrote:
ccing Hannes from suse, because this looks like a SLES only bug.
Hey Hannes,
The user is using Linux 2.6.27 x86 based on SLES + Xen 3.4 (as dom0)
running a couple of RHEL 5.5 VMs. The underlying storage for these VMs
is iSCSI based via open-iscsi 2.0.870-26.6.1 and a DELL equallogic array.
On 08/05/2010 02:21 PM, Goncalo Gomes wrote:
I've copied both the messages file from the host goncalog140 and the
patched libiscsi.c. FWIW, I've also included the iscsid.conf. Find these
files in the link below:
http://promisc.org/iscsi/
It looks like this chunk from libiscsi.c:iscsi_queuecommand:
case ISCSI_STATE_FAILED:
reason = FAILURE_SESSION_FAILED;
sc->result = DID_TRANSPORT_DISRUPTED<< 16;
break;
is causing IO errors.
You want to use something like DID_IMM_RETRY because it can be a long
time between the time the kernel marks the state as ISCSI_STATE_FAILED
until we start recovery and properly get all the device queues blocked,
so we can exhaust all the retries if we use DID_TRANSPORT_DISRUPTED.
Yeah, I noticed.
But the problem is that multipathing will stall during this time,
ie no failover will occur and I/O will stall. Using DID_TRANSPORT_DISRUPTED
will circumvent this and we can failover immediately.
It should stall, It works like FC and the fast io fail tmo. Users need
to set the iscsi replacement/recovery timeout like they would FC's fast
io fail tmo. They should set it to 3 or 5 secs or lower if they want
really fast failovers.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.