Hi!

I have a 2 nodes cluster: a simple test setup with a ocf:heartbeat:IPaddr2 resource, using xen VMs and stonith:external/xen0. Please see the complete config below.

Basically everything works fine, except in the case of broken corosync communication between the nodes (simulated by shutting down the network link used for corosync communication). In this case, both nodes almost at the same time detect that the other node went offline 'unclean' and shoot the other node in the head, causing a reboot of both nodes.

I know that the cluster network should be reliable and then this scenario should not happen. But is there a solution to avoid a deathmatch when the cluster communication for some reason is down, but the stonith network still works?

For me the obvious solution would be to use different timeouts for triggering the head-shot. I tried "startup-delay" as suggested in http://www.gossamer-threads.com/lists/linuxha/pacemaker/80918 but still both nodes trigger the head-shot immediately.

Do I use the parameter correctly (please see config below)?

Are there other possibilities to solve this problem?

As a workaround, is it possible to tweak the timeout parameters in corosync.conf or should they always be identical?

Thanks
Klaus


node pace1
node pace2
primitive ip_service ocf:heartbeat:IPaddr2 \
params ip="10.10.0.69" nic="eth0" cidr_netmask="24" iflabel="pace" \
        op monitor interval="60s"
primitive st-pace1 stonith:external/xen0 \
        params hostlist="pace1" dom0="xentest1" \
        op start start-delay="15s" interval="0"
primitive st-pace2 stonith:external/xen0 \
        params hostlist="pace2" dom0="xentest2"
location l-st-pace1 st-pace1 -inf: pace1
location l-st-pace2 st-pace2 -inf: pace2
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore"


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to