Greetings,
I've just started playing with pacemaker/corosync on a two node setup.
At this point I'm just experimenting, and trying to get a good feel
of how things work. Eventually I'd like to start using this in a
production environment. I'm running Fedora16-x86_64 with
pacemaker-1.1.7 & corosync-1.4.3. I have DRBD setup and working fine
with two resources. I've verified that pacemaker is doing the right
thing when initially configured. Specifically:
* the floating static IP is brought up
* DRBD is brought up correctly with a master & slave
* the local DRBD backed mount points are mounted correctly
Here's the configuration:
#########
node farm-ljf0 \
attributes standby="off"
node farm-ljf1
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="10.31.97.100" cidr_netmask="22" nic="eth1" \
op monitor interval="10s"
primitive FS0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="10" role="Master" \
op monitor interval="30" role="Slave"
primitive FS0_drbd ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/mnt/sdb1" fstype="xfs"
primitive FS1 ocf:linbit:drbd \
params drbd_resource="r1" \
op monitor interval="10s" role="Master" \
op monitor interval="30s" role="Slave"
primitive FS1_drbd ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/mnt/sdb2" fstype="xfs"
ms FS0_Clone FS0 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms FS1_Clone FS1 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location cli-prefer-ClusterIP ClusterIP \
rule $id="cli-prefer-rule-ClusterIP" inf: #uname eq farm-ljf1
colocation fs0_on_drbd inf: FS0_drbd FS0_Clone:Master
colocation fs1_on_drbd inf: FS1_drbd FS1_Clone:Master
order FS0_drbd-after-FS0 inf: FS0_Clone:promote FS0_drbd
order FS1_drbd-after-FS1 inf: FS1_Clone:promote FS1_drbd
property $id="cib-bootstrap-options" \
dc-version="1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
#########
However, when I attempted to simulate a failover situation (I shutdown
the current master/primary node completely), not everything failed
over correctly. Specifically, the mount points did not get mounted,
even though the other two elements did failover correctly.
'farm-ljf1' is the node that I shutdown, farm-ljf0 is the node that I
expected to inherit all of the resources. Here's the status:
#########
[root@farm-ljf0 ~]# crm status
============
Last updated: Thu Sep 27 15:00:19 2012
Last change: Thu Sep 27 13:59:42 2012 via cibadmin on farm-ljf1
Stack: openais
Current DC: farm-ljf0 - partition WITHOUT quorum
Version: 1.1.7-2.fc16-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
7 Resources configured.
============
Online: [ farm-ljf0 ]
OFFLINE: [ farm-ljf1 ]
ClusterIP (ocf::heartbeat:IPaddr2): Started farm-ljf0
Master/Slave Set: FS0_Clone [FS0]
Masters: [ farm-ljf0 ]
Stopped: [ FS0:0 ]
Master/Slave Set: FS1_Clone [FS1]
Masters: [ farm-ljf0 ]
Stopped: [ FS1:0 ]
Failed actions:
FS1_drbd_start_0 (node=farm-ljf0, call=23, rc=1, status=complete):
unknown error
FS0_drbd_start_0 (node=farm-ljf0, call=24, rc=1, status=complete):
unknown error
#########
I eventually brought up the shut down node (farm-ljf1) again, hoping
that might at least bring things back into a good state, but its not
working either, and is showing up as OFFLINE:
##########
[root@farm-ljf1 ~]# crm status
============
Last updated: Thu Sep 27 15:06:54 2012
Last change: Thu Sep 27 14:49:06 2012 via cibadmin on farm-ljf1
Stack: openais
Current DC: NONE
2 Nodes configured, 2 expected votes
7 Resources configured.
============
OFFLINE: [ farm-ljf0 farm-ljf1 ]
##########
So at this point, I've got two problems:
0) FS mount failover isn't working. I'm hoping this is some silly
configuration issue that can be easily resolved.
1) bringing the "failed" farm-ljf1 node back online doesn't seem to
work automatically, and I can't figure out what kind of magic is
needed.
If this stuff is documented somewhere, I'll gladly read it, if someone
can point me in the right direction.
thanks!
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org