Hi, I have a trouble with my test configuration. I build an Actice/Active cluster Ubuntu(11.10)+DRBD+Cman+Pacemaker+gfs2+Xen for test purpose. Now i am doing some tests with availability. I am try to start cluster on one node.
Trouble is - the Filesystem primitive ClusterFS (fs type=gfs2) does not start when one of two nodes is switched off. Here my configuration: node blaster \ attributes standby="off" node turrel \ attributes standby="off" primitive ClusterData ocf:linbit:drbd \ params drbd_resource="clusterdata" \ op monitor interval="60s" primitive ClusterFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/clusterdata" directory="/mnt/cluster" fstype="gfs2" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" \ op monitor interval="60s" timeout="60s" primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip="192.168.122.252" cidr_netmask="32" clusterip_hash="sourceip" \ op monitor interval="30s" primitive SSH-stonith stonith:ssh \ params hostlist="turrel blaster" \ op monitor interval="60s" primitive XenDom ocf:heartbeat:Xen \ params xmfile="/etc/xen/xen1.example.com.cfg" \ meta allow-migrate="true" is-managed="true" target-role="Stopped" \ utilization cores="1" mem="512" \ op monitor interval="30s" timeout="30s" \ op start interval="0" timeout="90s" \ op stop interval="0" timeout="300s" ms ClusterDataClone ClusterData \ meta master-max="2" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" clone ClusterFSClone ClusterFS \ meta target-role="Started" is-managed="true" clone IP ClusterIP \ meta globally-unique="true" clone-max="2" clone-node-max="2" clone SSH-stonithClone SSH-stonith location prefere-blaster XenDom 50: blaster colocation XenDom-with-ClusterFS inf: XenDom ClusterFSClone colocation fs_on_drbd inf: ClusterFSClone ClusterDataClone:Master order ClusterFS-after-ClusterData inf: ClusterDataClone:promote ClusterFSClone:start order XenDom-after-ClusterFS inf: ClusterFSClone XenDom property $id="cib-bootstrap-options" \ dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ cluster-infrastructure="cman" \ expected-quorum-votes="2" \ stonith-enabled="true" \ no-quorum-policy="ignore" \ last-lrm-refresh="1329194925" rsc_defaults $id="rsc-options" \ resource-stickiness="100" Here is an $crm resource show: Master/Slave Set: ClusterDataClone [ClusterData] Masters: [ turrel ] Stopped: [ ClusterData:1 ] Clone Set: IP [ClusterIP] (unique) ClusterIP:0 (ocf::heartbeat:IPaddr2) Started ClusterIP:1 (ocf::heartbeat:IPaddr2) Started Clone Set: ClusterFSClone [ClusterFS] Stopped: [ ClusterFS:0 ClusterFS:1 ] Clone Set: SSH-stonithClone [SSH-stonith] Started: [ turrel ] Stopped: [ SSH-stonith:1 ] XenDom (ocf::heartbeat:Xen) Stopped I tryed: crm(live)resource# cleanup ClusterFSClone Cleaning up ClusterFS:0 on turrel Cleaning up ClusterFS:1 on turrel Waiting for 3 replies from the CRMd... OK I can see only warn message in /var/log/cluster/corosync.log Feb 14 16:25:56 turrel pengine: [1640]: WARN: unpack_rsc_op: Processing failed op ClusterFS:0_start_0 on turrel: unknown exec error (-2) and Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness: Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000) Feb 14 16:25:56 turrel pengine: [1640]: WARN: common_apply_stickiness: Forcing ClusterFSClone away from turrel after 1000000 failures (max=1000000) Direct me, please, what i need to check or else? Best regards, Dmitriy Bogomolov _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org