[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Klaus Darilion Wed, 15 May 2013 05:43:17 -0700

Hi!

I have a 2 nodes cluster: a simple test setup with aocf:heartbeat:IPaddr2 resource, using xen VMs and stonith:external/xen0.Please see the complete config below.

Basically everything works fine, except in the case of broken corosynccommunication between the nodes (simulated by shutting down the networklink used for corosync communication). In this case, both nodes almostat the same time detect that the other node went offline 'unclean' andshoot the other node in the head, causing a reboot of both nodes.

I know that the cluster network should be reliable and then thisscenario should not happen. But is there a solution to avoid adeathmatch when the cluster communication for some reason is down, butthe stonith network still works?

For me the obvious solution would be to use different timeouts fortriggering the head-shot. I tried "startup-delay" as suggested inhttp://www.gossamer-threads.com/lists/linuxha/pacemaker/80918 but stillboth nodes trigger the head-shot immediately.


Do I use the parameter correctly (please see config below)?

Are there other possibilities to solve this problem?

As a workaround, is it possible to tweak the timeout parameters incorosync.conf or should they always be identical?


Thanks
Klaus


node pace1
node pace2
primitive ip_service ocf:heartbeat:IPaddr2 \

params ip="10.10.0.69" nic="eth0" cidr_netmask="24"iflabel="pace" \

        op monitor interval="60s"
primitive st-pace1 stonith:external/xen0 \
        params hostlist="pace1" dom0="xentest1" \
        op start start-delay="15s" interval="0"
primitive st-pace2 stonith:external/xen0 \
        params hostlist="pace2" dom0="xentest2"
location l-st-pace1 st-pace1 -inf: pace1
location l-st-pace2 st-pace2 -inf: pace2
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore"


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] Stonith: How to avoid deathmatch cluster partitioning

Reply via email to