Hi all,

during heavy I/O the iSCSI initiator starts spitting out receive timeouts and
connection failures, even though the connection itself is not faulty.

I managed to trace it down to the way open-iscsi treats SCSI commands.
During queuecommand we're just taking the scmd, add it to the cmdqueue, and
kick the workqueue to transmit these commands.
However, when the system is under heavy I/O load the times difference
between queueing and processing the command on the workqueue might be
quite considerable.
In fact, it might be longer than the SCSI command timeout itself, causing
the command to timeout.
And to make matters worse, we're injecting NOPs now and again to detect
the connection is still alive. However, currently we're only counting
the time since we last received some data from the target. The time
the NOP request is stuck on the queue is not being taken into account,
causing erroneous connection failures.

To remedy this I've created two patches, one for checking the cmdqueue
before trying to send NOPs and the other for checking the cmdqueue
when the SCSI command timeout has kicked in.
They _do_ looks sane to me, and they certainly cause the
spurious connection failures to drop quite considerably.
However, as they interfere with error handling I'd like
to have a second or third opinion here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                   zSeries & Storage
h...@suse.de                          +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to