[Pacemaker] no failover if fencing device is unreachable (i.e. power loss)

Felix Schrage Mon, 18 Aug 2014 10:56:30 -0700

Hi,

I'am building a two-node cluster running XenServer, pacemaker and DRBD. There's 
a problem when testing the failover by powering off the current active node.
When using the fence_xenapi agent, the resource ClusterIP will not be moved to 
the 2nd node until the first node was successfully shut down.
However  because the XenAPI is unreachable when the machine is powered off, the 
2nd node continuously is trying to shut down the node and the resource is never 
moved.


To check if it's an error with the fence_xenapi-agent I tried fence_ipmilan 
which is working fine as long as the IPMI is is reachable. When pulling the 
power cords from the machine
however the behavior is the same as with the fence_xenapi agent.
Am I missing an option which should be set? A timeout or a retry counter?

Here's how I setup the cluster (fence_xenapi) using pcs:

pcs cluster cib ftp_ha_cluster
pcs -f ftp_ha_cluster resource create ClusterIP IPaddr2 ip=172.20.150.150 
cidr_netmask=32 op monitor interval=20s
pcs -f ftp_ha_cluster constraint location ClusterIP prefers ftp-test01=50
pcs -f ftp_ha_cluster stonith create xenvm-fence-ftp1 fence_xenapi 
pcmk_host_list="ftp-test01" action="off" session_url="https://test-xen-01"; 
port="ftp-test01" login="root" passwd="****" delay=15 op monitor interval=40s
pcs -f ftp_ha_cluster stonith create xenvm-fence-ftp2 fence_xenapi 
pcmk_host_list="ftp-test02" action="off" session_url="https://test-xen-02"; 
port="ftp-test02" login="root" passwd="****" delay=15 op monitor interval=40s
pcs -f ftp_ha_cluster constraint location xenvm-fence-ftp1 prefers 
ftp-test01=-INFINITY
pcs -f ftp_ha_cluster constraint location xenvm-fence-ftp2 prefers 
ftp-test02=-INFINITY
pcs -f ftp_ha_cluster property set stonith-enabled=true
pcs -f ftp_ha_cluster property set stonith-action=off
pcs -f ftp_ha_cluster property set stonith-timeout=40s
pcs -f ftp_ha_cluster property set no-quorum-policy=ignore
pcs -f ftp_ha_cluster resource create Ping ocf:pacemaker:ping dampen="5s" 
multiplier="100" host_list="172.20.150.1 172.20.150.151 172.20.150.152" 
attempts="3" op monitor interval=20s
pcs -f ftp_ha_cluster resource clone Ping
pcs -f ftp_ha_cluster constraint location ClusterIP rule score=-INF not_defined 
pingd or pingd lte 0
pcs -f ftp_ha_cluster constraint location ClusterIP rule score=pingd defined 
pingd
pcs cluster cib-push ftp_ha_cluster

for testing with fence_ipmilan I replaced the appropriate lines with the 
following:

pcs -f ftp_ha_cluster stonith create ipmi-fence-test-xen-01 fence_ipmilan 
pcmk_host_list="ftp-test01" action="off" ipaddr="test-xen-01-bmc.mercateo.lan" 
auth="password" login="admin" passwd="****" delay=15 op monitor interval=40s
pcs -f ftp_ha_cluster stonith create ipmi-fence-test-xen-02 fence_ipmilan 
pcmk_host_list="ftp-test02" action="off" ipaddr="test-xen-02-bmc.mercateo.lan" 
auth="password" login="admin" passwd="****" delay=15 op monitor interval=40s
pcs -f ftp_ha_cluster constraint location ipmi-fence-test-xen-01 prefers 
ftp-test01=-INFINITY
pcs -f ftp_ha_cluster constraint location ipmi-fence-test-xen-02 prefers 
ftp-test02=-INFINITY


the content of /etc/corosync/corosync.conf:

compatibility: whitetank

totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 192.168.199.0
                mcastaddr: 226.94.1.1
                mcastport: 5405
                ttl: 1
        }
}

logging {
        fileline: off
        to_stderr: no
        to_logfile: yes
        to_syslog: no
        logfile: /var/log/cluster/corosync.log
        debug: off
        timestamp: on
        logger_subsys {
                subsys: AMF
                debug: off
        }
}

amf {
        mode: disabled
}

service {
        ver:    1
        name:   pacemaker
}

Any idea what could be missing/wrong?

Kind regards,

Felix

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] no failover if fencing device is unreachable (i.e. power loss)

Reply via email to