Hi,

I am configuring a two node HA cluster that has only one service.
The sole purpose of the cluster is to keep the service up with minimum 
disruption for the widest possible range of failure scenarios.

I configured a quorum disk to make sure that after a failure of a node, the 
cluster (now consisting of only one node) continues to have quorum.

I am considering a partitioned cluster scenario.  Partitioned means to me that 
the cluster nodes lost the cluster communication path.  Without quorum disk 
each of the nodes in the cluster will fence the other.

However the manual page for qdisk gives premise of solving the problem in the 
list of design requirement that it apparently fulfils:

Quote:
Ability to use the external reasons for deciding which partition is the quorate 
partition in a partitioned cluster.  For example, a user may have a service 
running on one node, and that node must always be the master in the event of a 
network partition.
Unquote.

This is exactly what I would like to achieve.  I know which node should stay 
alive - the one running my service, and it is trivial for me to find this out 
directly, as I can query for its status locally on a node. I do not have use 
the network.  This can be used as a heuristic for the quorum disc.

What I am missing is how to make that into a workable whole.  Specifically the 
following aspects are of concern:

1.
I do not want the other node to be ejected from the cluster just because it 
does not run the service.  But the test is binary, so it looks like it will be 
ejected.

2.
Startup time, before the service started.  As no node has the service, both 
will be candidates for ejection.

3.
Service migration time.
During service migration from one node to another, there is a transient period 
of time when the service is not active on either node.

Questions:

1.
How do I put all of this together to achieve the overall objective of the node 
with the service surviving the partitioning event uninterrupted?

2.
What is the relationship between  fencing and node suicide due to communication 
through quorum disk?

3.
How does the master election relate to this?

I would be grateful for any insights, pointers to documentation, etc.

Thanks and regards,

Chris Jankowski





--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to