[Pacemaker] failed over filesystem mount points not coming up on secondary node

Lonni J Friedman Thu, 27 Sep 2012 15:14:31 -0700

Greetings,
I've just started playing with pacemaker/corosync on a two node setup.
 At this point I'm just experimenting, and trying to get a good feel
of how things work.  Eventually I'd like to start using this in a
production environment.  I'm running Fedora16-x86_64 with
pacemaker-1.1.7 & corosync-1.4.3.  I have DRBD setup and working fine
with two resources.  I've verified that pacemaker is doing the right
thing when initially configured.  Specifically:
* the floating static IP is brought up
* DRBD is brought up correctly with a master & slave
* the local DRBD backed mount points are mounted correctly


Here's the configuration:
#########
node farm-ljf0 \
        attributes standby="off"
node farm-ljf1
primitive ClusterIP ocf:heartbeat:IPaddr2 \
        params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
        op monitor interval="10s"
primitive FS0 ocf:linbit:drbd \
        params drbd_resource="r0" \
        op monitor interval="10" role="Master" \
        op monitor interval="30" role="Slave"
primitive FS0_drbd ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs"
primitive FS1 ocf:linbit:drbd \
        params drbd_resource="r1" \
        op monitor interval="10s" role="Master" \
        op monitor interval="30s" role="Slave"
primitive FS1_drbd ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/mnt/sdb2" fstype="xfs"
ms FS0_Clone FS0 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms FS1_Clone FS1 \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location cli-prefer-ClusterIP ClusterIP \
        rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq farm-ljf1
colocation fs0_on_drbd inf: FS0_drbd FS0_Clone:Master
colocation fs1_on_drbd inf: FS1_drbd FS1_Clone:Master
order FS0_drbd-after-FS0 inf: FS0_Clone:promote FS0_drbd
order FS1_drbd-after-FS1 inf: FS1_Clone:promote FS1_drbd
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
#########

However, when I attempted to simulate a failover situation (I shutdown
the current master/primary node completely), not everything failed
over correctly.  Specifically, the mount points did not get mounted,
even though the other two elements did failover correctly.
'farm-ljf1' is the node that I shutdown, farm-ljf0 is the node that I
expected to inherit all of the resources.  Here's the status:
#########
[root@farm-ljf0 ~]# crm status
============
Last updated: Thu Sep 27 15:00:19 2012
Last change: Thu Sep 27 13:59:42 2012 via cibadmin on farm-ljf1
Stack: openais
Current DC: farm-ljf0 - partition WITHOUT quorum
Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
7 Resources configured.
============

Online: [ farm-ljf0 ]
OFFLINE: [ farm-ljf1 ]

 ClusterIP      (ocf::heartbeat:IPaddr2):       Started farm-ljf0
 Master/Slave Set: FS0_Clone [FS0]
     Masters: [ farm-ljf0 ]
     Stopped: [ FS0:0 ]
 Master/Slave Set: FS1_Clone [FS1]
     Masters: [ farm-ljf0 ]
     Stopped: [ FS1:0 ]

Failed actions:
    FS1_drbd_start_0 (node=farm-ljf0, call=23, rc=1, status=complete):
unknown error
    FS0_drbd_start_0 (node=farm-ljf0, call=24, rc=1, status=complete):
unknown error
#########

I eventually brought up the shut down node (farm-ljf1) again, hoping
that might at least bring things back into a good state, but its not
working either, and is showing up as OFFLINE:
##########
[root@farm-ljf1 ~]# crm status
============
Last updated: Thu Sep 27 15:06:54 2012
Last change: Thu Sep 27 14:49:06 2012 via cibadmin on farm-ljf1
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
7 Resources configured.
============

OFFLINE: [ farm-ljf0 farm-ljf1 ]
##########


So at this point, I've got two problems:
0) FS mount failover isn't working.  I'm hoping this is some silly
configuration issue that can be easily resolved.
1) bringing the "failed" farm-ljf1 node back online doesn't seem to
work automatically, and I can't figure out what kind of magic is
needed.


If this stuff is documented somewhere, I'll gladly read it, if someone
can point me in the right direction.

thanks!

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] failed over filesystem mount points not coming up on secondary node

Reply via email to