On Mon, 1 July 2013 08:50:48 +0200, Hannes Reinecke wrote:
> 
> This patchset implements a new 'eh_deadline' attribute to the
> SCSI host. It will limit the overall SCSI EH runtime by a given
> timeout. If the timeout is reached all intermediate EH steps
> will be skipped and host reset will be scheduled immediately.

I have mixed opinions about the concept.

Having a command timeout is of limited use if you can still spend
several minutes after the timeout in random processing.  Userspace
either needs -EIO reasonably quickly after a command timeout or will
have to implement it's own timeout mechanism.  I prefer having a
single implementation in the kernel, so your patches are a step in the
right direction.

Host reset is an expensive and harmful operation.  You lose access to
all devices behind the host.  At best this is a performance blip, at
worst someone actually cared about some realtime properties.  My main
grump is that a single bad device can trigger this behaviour,
essentially doing a DoS on the rest of the system.  While that problem
is somewhat orthogonal, your patchset can only make matters worse.

Ideally we would have a way to detect the system geometry and next the
error location.  If a single device is bad, don't ever do a host
reset.  If you have redundant paths, never do a host reset on both
controllers at the same time.  Etc, etc.

Getting there will be a lot of work and the result may be too
error-prone to maintain without constantly breaking one exotic setup
or another.  But if someone could pull it off, it would be really nice
to have.

That said, now I should actually read your patches. ;)

Jörn

--
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to