Hi,
I'm testing pacemaker resource failover in a very simple test environment with
two virtual machines.
3 Cloned resources (drbd dualprimary), controld, clvm.
Fencing with external/ssh that's it.
I'm having problems understanding why my clvm resource gets restarted when a
failing node gets back online.
When one node is powerd off (failtest) the remaining node fences the "failing"
node and the clvm-resource stays online.
But when the failed node is back online the clvm resource clone on the
previously "remaining " node gets restarted without visible reason (see logs)
I gues doing something wrong!
But what?
Anyone who can point me in the right direction?
Thank you!
Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke: Query 228: Requesting
the current CIB: S_POLICY_ENGINE
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation
res_drbd_1:1_monitor_0 found resource res_drbd_1:1 active on tnode1
Sep 20 13:18:41 tnode2 crmd: [3121]: info: do_pe_invoke_callback: Invoking the
PE: query=228, ref=pe_calc-dc-1316517521-176, seq=1268, quorate=1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: unpack_rsc_op: Operation
res_drbd_1:0_monitor_0 found resource res_drbd_1:0 active on tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Master/Slave Set:
ms_drbd_1 [res_drbd_1]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Masters: [
tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Slaves: [
tnode1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set:
cl_controld_1 [res_controld_dlm]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [
tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [
res_controld_dlm:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print:
stonith_external_ssh_1#011(stonith:external/ssh):#011Started tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: native_print:
stonith_external_ssh_2#011(stonith:external/ssh):#011Started tnode2
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: clone_print: Clone Set:
cl_clvmd_1 [res_clvmd_clustervg]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Started: [
tnode2 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: short_print: Stopped: [
res_clvmd_clustervg:1 ]
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: RecurringOp: Start recurring
monitor (60s) for res_controld_dlm:1 on tnode1
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
res_drbd_1:0#011(Master tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Promote
res_drbd_1:1#011(Slave -> Master tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
res_controld_dlm:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start
res_controld_dlm:1#011(tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
stonith_external_ssh_1#011(Started tnode1)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Leave
stonith_external_ssh_2#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Restart
res_clvmd_clustervg:0#011(Started tnode2)
Sep 20 13:18:41 tnode2 pengine: [3116]: notice: LogActions: Start
res_clvmd_clustervg:1#011(tnode1)
CONFIG
node tnode1 \
attributes standby="off"
node tnode2 \
attributes standby="off"
primitive res_clvmd_clustervg ocf:lvm2:clvmd \
params daemon_timeout="30" \
operations $id="res_clvmd_clustervg-operations" \
op monitor interval="0" timeout="4min" start-delay="5"
primitive res_controld_dlm ocf:pacemaker:controld \
operations $id="res_controld_dlm-operations" \
op monitor interval="60" timeout="60" start-delay="0" \
meta target-role="started"
primitive res_drbd_1 ocf:linbit:drbd \
params drbd_resource="r0" \
operations $id="res_drbd_1-operations" \
op start interval="0" timeout="240" \
op promote interval="0" timeout="90" \
op demote interval="0" timeout="90" \
op stop interval="0" timeout="100" \
op monitor interval="10" timeout="20" start-delay="1min" \
op notify interval="0" timeout="90" \
meta target-role="started" is-managed="true"
primitive stonith_external_ssh_1 stonith:external/ssh \
params hostlist="tnode2" \
operations $id="stonith_external_ssh_1-operations" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="60" timeout="60" start-delay="0" \
meta failure-timeout="3"
primitive stonith_external_ssh_2 stonith:external/ssh \
params hostlist="tnode1" \
operations $id="stonith_external_ssh_2-operations" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60" \
op monitor interval="60" timeout="60" start-delay="0" \
meta target-role="started" failure-timeout="3"
ms ms_drbd_1 res_drbd_1 \
meta master-max="2" clone-max="2" notify="true" ordered="true"
interleave="true"
clone cl_clvmd_1 res_clvmd_clustervg \
meta clone-max="2" notify="true"
clone cl_controld_1 res_controld_dlm \
meta clone-max="2" notify="true" ordered="true" interleave="true"
location loc_ms_drbd_1-ping-prefer ms_drbd_1 \
rule $id="loc_ms_drbd_1-ping-prefer-rule" pingd: defined pingd
location loc_stonith_external_ssh_1_tnode2 stonith_external_ssh_1 -inf: tnode2
location loc_stonith_external_ssh_2_tnode1 stonith_external_ssh_2 -inf: tnode1
colocation col_cl_controld_1_cl_clvmd_1 inf: cl_clvmd_1 cl_controld_1
colocation col_ms_drbd_1_cl_controld_1 inf: cl_controld_1 ms_drbd_1:Master
order ord_cl_controld_1_cl_clvmd_1 inf: cl_controld_1 cl_clvmd_1
order ord_ms_drbd_1_cl_controld_1 inf: ms_drbd_1:promote cl_controld_1:start
property $id="cib-bootstrap-options" \
expected-quorum-votes="2" \
stonith-timeout="30" \
dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
no-quorum-policy="ignore" \
cluster-infrastructure="openais" \
_______________________________________________
Pacemaker mailing list: [email protected]
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker