Hi,

Note to linux-scsi --- there's a genuine New Question about the scsi
layer down below,  so don't be put off by the ramblings about cluster
SCSI reservation which precede it. :)

On Thu, Jun 29, 2000 at 12:15:24PM -0400, Keith Barrett wrote:
> > 
> > I'd like to understand more about what you feel is missing for
> > shared scsi. Can you elaborate on why you feel driver patches are
> > needed? Is this for io fencing? If so, we use the stonith type
> > approach in kimberlite for doing this.
> 
> Driver would provide a standard API for SCSI reservation, more
> internal control over sharing and failover, and a communication
> path independent of networking; likely avoiding the need for
> stonith (which is a pretty serious action).
> 
> You might want to ask Alan Cox or Stephen Tweedie why this is
> better.

SCSI reservation, where available, is a great tool for doing quorum
assignment.  It doesn't solve the I/O fencing problem completely.  

There are several problems here.  One of the main problems which gets
in the way of using SCSI reservation for true fencing is that
reservation is typically implemented at the level of individual disks.
You _can_ build a hardware raid array that supports reservation, but a
lot of them don't.  When you start building software constructs like
LVM or soft raid on top, you lose access to the reservation facility.

Reservation also has the wrong granularity for some purposes.  If you
are running a shared-everything configuration such as Oracle Parallel
Server or GFS in which multiple nodes can access the disk at once,
then obviously SCSI reservation doesn't help because you are forced to
use a configuration in which the bulk of the data is unreserved.  In
such a case, you can still use reservation to help with quorum, but
not with fencing.

SCSI reservation really can help with quorum, though.  It is yet
another layer of hardware protection against a rogue node going
haywire in a cluster.  That doesn't mean that it is a complete
replacement for STONITH, or vice-versa; the more such protection you
have in an HA cluster, the safer the guarantees you can make are.
There is never such a thing as too much redundancy in your IO fencing
mechanism!

There is another reason why SCSI reservation is a Good Thing to have
--- stealing another node's reservation is an expensive, heavyweight
operation requiring device resets.  Reservation management really does
go straight to the disk for this.  You are much less likely to have
your quorum scheme defeated by things like controller block caching if
you can rely on reservation.  

*** Which brings me to the question for linux-scsi ***

Right now, I don't think Linux ever uses the FUA (force unit access)
bit in the SCSI commands it generates.  SCSI folks, how easy would it
be to get that bit exposed to the buffer cache layers for use by raw
I/O?  Without FUA being settable, a clustered SCSI configuration is not
going to be able to work well in the presence of caching SCSI
controllers.  The use of SCSI reservation should not have this
problem.

Anyway, we probably want to support as many fencing mechanisms as
possible just because of the added safety that each such layer
provides.  That means supporting SCSI reservation where present;
hardware STONITH where available, and software STONITH otherwise; and
voluntary hard reboots of nodes which detect a loss of quorum locally.
Remember, in an HA configuration, taking a single machine down isn't
supposed to be a hugely expensive operation, so it's worth it in terms
of data integrity to be really cautious and take down any nodes which
might be at any risk of leaking old data onto a newly-quorate disk.

Cheers, 
 Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to