On 5/13/2013 3:29 PM, Martin K. Petersen wrote:

> others. We see cases fairly often where a misbehaving target has
> confused the HBA enough that we can not bring the device back without
> doing an HBA firmware reset. Despite I/O completing successfully on
> other targets connected to the same HBA.

        This would seem to indicate a HBA/driver bug...

> So at some point we do need to give up and escalate to a full HBA
> reset. We would just like to defer that hammer until we have run out of
> other options.

        Except that I've seen the linux error recovery cause more problems than 
it
solves on a fairly regular basis. I would rather have a solution designed to
isolate failures, than one that makes a lot of mistakes and causes further
problems (sometimes with other machines). I'm pretty convinced that attempting
everything possible to recover a device when the underlying problem is unknown
is a bad strategy.

        I think maybe its a perspective difference. If the device that is 
failing is an
OS disk, then giving up is paramount to crashing the machine. On the other hand,
if the failing device is some shared tape drive in a SAN with a few hundred
alternatives then killing the OS in an attempt to recover that drive is a 
problem.

        Maybe, the super aggressive recovery paths should be reserved for 
devices
marked critical to system operation.


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to