On 11/09/13 11:50, Alan Brown wrote:
On 11/09/13 13:37, Digimer wrote:
The problem is that, if you enable cman on boot, the fenced node will
try to join the cluster, fail to reach it's peer after post_join_delay
(default 6 seconds, iirc) and fence it's peer. That peer reboots, starts
cman, tries to connect, fenced it's peer...
Qdisk is a good way of preventing this kind of problem.
If you have a SAN.
The easiest way to avoid this in 2-node clusters is to not let
cman/rgmanager start automatically.
For some values of "easy"
Your solution means every startup requires manual intervention.
Qdisk will let the cluster come up/restart nodes without needing human
help at startup.
The way I see it, and I've had the clusters in production for years in
various locations, fencing happens extremely rarely. If a node gets
fenced, *something* went wrong and I will want to investigate before I
rejoin the node. So the fact that I have to manually start
cman/rgmanager is a trivial cost.
Out of about 20 2-node clusters, I've had maybe three or four fence
events in four years, and all of them where from failing equipment. So
in all cases, not rejoining the cluster was safest anyway.
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster