At 09:25 AM 1/18/2006, Jules Gosnell wrote:
I haven't been able to convince myself to take the quorum approach because...
shared-something approach:
- the shared something is a Single Point of Failure (SPoF) -
although you could use an HA something.
That's how WAS and WLS do it. Use an HA database, SAN or dual-ported
scsi. The latter is cheap. The former are probably already available
to customers if they really care about availability.
- If the node holding the lock 'goes crazy', but does not die, the
rest of the
This is generally why you use leases. Then your craziness is only
believed for a fixed amount of time.
cluster becomes a fragment - so it becomes an SPoF as well.
- used in isolation, it does not take into account that the lock may
be held by the smallest cluster fragment
You generally solve this again with leases. i.e. a lock that is valid
for some period.
shared-nothing approach:
Nice in theory but tricky to implement well. Consensus works well here.
- I prefer this approach, but, as you have stated, if the two halves
are equally sized...
- What if there are two concurrent fractures (does this happen?)
- ActiveCluster notifies you of one membership change at a time - so
you would have to decide on an algorithm for 'chunking' node loss,
so that you could decide when a fragmentation had occurred...
If you really want to do this reliably you have to assume that AC
will send you bogus notifications. Ideally you want to achieve a
consensus on membership to avoid this. It sounds like totem solves
some of these issues.
andy