Hi, On Mon, Mar 07, 2011 at 10:55:01AM +0100, Sascha Hagedorn wrote: > Hello everyone, > > I am evaluating a two node cluster setup and I am running into some problems. > The cluster runs a dual master DRBD disk with a OCFS2 filesystem. Here are > the used software versions: > > > - SLES11 + HAE Extension
SLE11 is not supported anymore, you'd need to upgrade to SLE11SP1. > - DRBD 8.3.7 > > - OCFS2 1.4.2 > > - libdlm 3.00.01 > > - cluster-glue 1.0.5 > > - Pacemaker 1.1.2 > > - OpenAIS 1.1.2 > > The problem occurs when the second node is being powered off instantly by > pulling the power cable. Shortly after that the load average on the > surviving system goes up at a very high rate, with no CPU utilization until > the server becomes unresponsive. Processes I see in the top list very > frequently are cib, dlm_controld, corosync and ha_logd. Access to the DRBD > partition is not possible, although the crm_mon shows it is being mounted and > all services are running. An "ls" on the DRBD OCFS2 partition results in a > hanging prompt (So does "df" or any other command accessing the partition). You created a split-brain condition, but have no stonith resources (and stonith is disabled). That won't work. Thanks, Dejan > > crm_mon after the power is cut on cluster-node2: > > ============ > Last updated: Mon Mar 7 10:32:10 2011 > Stack: openais > Current DC: cluster-node1 - partition WITHOUT quorum > Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5 > 2 Nodes configured, 2 expected votes > 4 Resources configured. > ============ > > Online: [ cluster-node1 ] > OFFLINE: [ cluster-node2 ] > > Master/Slave Set: ms_drbd > Masters: [ cluster-node1 ] > Stopped: [ p_drbd:1 ] > Clone Set: cl_dlm > Started: [ cluster-node1 ] > Stopped: [ p_dlm:1 ] > Clone Set: cl_o2cb > Started: [ cluster-node1 ] > Stopped: [ p_o2cb:1 ] > Clone Set: cl_fs > Started: [ cluster-node1 ] > Stopped: [ p_fs:1 ] > > The configuration is as follows: > > node cluster-node1 > node cluster-node2 > primitive p_dlm ocf:pacemaker:controld \ > op monitor interval="120s" > primitive p_drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > operations $id="p_drbd-operations" \ > op monitor interval="20" role="Master" timeout="20" \ > op monitor interval="30" role="Slave" timeout="20" > primitive p_fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/data" fstype="ocfs2" \ > op monitor interval="120s" > primitive p_o2cb ocf:ocfs2:o2cb \ > op monitor interval="120s" > ms ms_drbd p_drbd \ > meta resource-stickines="100" notify="true" master-max="2" > interleave="true" > clone cl_dlm p_dlm \ > meta globally-unique="false" interleave="true" > clone cl_fs p_fs \ > meta interleave="true" ordered="true" > clone cl_o2cb p_o2cb \ > meta globally-unique="false" interleave="true" > colocation co_dlm-drbd inf: cl_dlm ms_drbd:Master > colocation co_fs-o2cb inf: cl_fs cl_o2cb > colocation co_o2cb-dlm inf: cl_o2cb cl_dlm > order o_dlm-o2cb 0: cl_dlm cl_o2cb > order o_drbd-dlm 0: ms_drbd:promote cl_dlm > order o_o2cb-fs 0: cl_o2cb cl_fs > property $id="cib-bootstrap-options" \ > dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > > Here is a snippet from /var/log/messages (power cut at 10:32:02): > > Mar 7 10:32:03 cluster-node1 kernel: [ 4714.838629] r8169: eth0: link down > Mar 7 10:32:06 cluster-node1 corosync[4300]: [TOTEM ] A processor failed, > forming new configuration. > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748011] block drbd0: PingAck did > not arrive in time. > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748020] block drbd0: peer( > Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> > DUnknown ) > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748031] block drbd0: asender > terminated > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748035] block drbd0: short read > expecting header on sock: r=-512 > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748037] block drbd0: Terminating > asender thread > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.748068] block drbd0: Creating > new current UUID > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763424] block drbd0: Connection > closed > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763429] block drbd0: conn( > NetworkFailure -> Unconnected ) > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763434] block drbd0: receiver > terminated > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763436] block drbd0: Restarting > receiver thread > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763439] block drbd0: receiver > (re)started > Mar 7 10:32:06 cluster-node1 kernel: [ 4717.763443] block drbd0: conn( > Unconnected -> WFConnection ) > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION > CHANGE > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration: > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > ip(10.140.1.1) > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left: > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > ip(10.140.1.2) > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined: > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice: > pcmk_peer_update: Transitional membership event on ring 11840: memb=1, new=0, > lost=1 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > pcmk_peer_update: memb: cluster-node1 1 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > pcmk_peer_update: lost: cluster-node2 2 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] CLM CONFIGURATION > CHANGE > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] New Configuration: > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] r(0) > ip(10.140.1.1) > Mar 7 10:32:10 cluster-node1 cib: [4309]: notice: ais_dispatch: Membership > 11840: quorum lost > Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: ais_dispatch: Membership > 11840: quorum lost > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: update_cluster: Processing > membership 11840 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Left: > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: crm_update_peer: Node > cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1 > born=11836 seen=11836 proc=00000000000000000000000000151312 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: ais_status_callback: > status: cluster-node2 is now lost (was member) > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Skipped > active node 1: born-on=11780, last-seen=11840, this-event=11840, > last-event=11836 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [CLM ] Members Joined: > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_peer: Node > cluster-node2: id=2 state=lost (new) addr=r(0) ip(10.140.1.2) votes=1 > born=11836 seen=11836 proc=00000000000000000000000000151312 > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: del_configfs_node: > del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/2" > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] notice: > pcmk_peer_update: Stable membership event on ring 11840: memb=1, new=0, lost=0 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: erase_node_from_join: > Removed node cluster-node2 from join calculations: welcomed=0 itegrated=0 > finalized=0 confirmed=1 > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: ocfs2_controld (group > "ocfs2:controld") confchg: members 1, left 1, joined 0 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: dlm_process_node: Removed > inactive node 2: born-on=11836, last-seen=11836, this-event=11840, > last-event=11836 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > pcmk_peer_update: MEMB: cluster-node1 1 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_update_quorum: Updating > quorum status to false (call=634) > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node daemon left 2 > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: node down 2 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config: dlm:controld > conf 1 0 1 memb 1 join left 2 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node cluster-node2 was not seen in the previous > transition > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Node 2 has left > mountgroup 17633D496670435F99A9C3A12F3FFFF0 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: log_config: > dlm:ls:17633D496670435F99A9C3A12F3FFFF0 conf 1 0 1 memb 1 join left 2 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: node_history_fail: > 17633D496670435F99A9C3A12F3FFFF0 check_fs nodeid 2 set > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: update_member: > Node 2/cluster-node2 is now: lost > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change: > 17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 remove nodeid 2 reason 3 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [pcmk ] info: > send_member_notification: Sending membership update 11840 to 4 children > Mar 7 10:32:10 cluster-node1 corosync[4300]: [TOTEM ] A processor joined > or left the membership and a new membership was formed. > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: add_change: > 17633D496670435F99A9C3A12F3FFFF0 add_change cg 19 counts member 1 joined 0 > remove 1 failed 1 > Mar 7 10:32:10 cluster-node1 corosync[4300]: [MAIN ] Completed service > synchronization, ready to provide service. > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: stop_kernel: > 17633D496670435F99A9C3A12F3FFFF0 stop_kernel cg 19 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: do_sysfs: write "0" to > "/sys/kernel/dlm/17633D496670435F99A9C3A12F3FFFF0/control" > Mar 7 10:32:10 cluster-node1 kernel: [ 4721.450691] dlm: closing connection > to node 2 > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Sending notification of > node 2 for "17633D496670435F99A9C3A12F3FFFF0" > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: confchg called > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: group > "ocfs2:17633D496670435F99A9C3A12F3FFFF0" confchg: members 1, left 1, joined 0 > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > Operation complete: op cib_modify for section nodes (origin=local/crmd/632, > version=0.30.30): ok (rc=0) > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: cib:diff: > - <cib have-quorum="1" admin_epoch="0" epoch="30" num_updates="31" /> > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: log_data_element: cib:diff: > + <cib have-quorum="0" admin_epoch="0" epoch="31" num_updates="1" /> > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > Operation complete: op cib_modify for section cib (origin=local/crmd/634, > version=0.31.1): ok (rc=0) > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: crm_ais_dispatch: Setting > expected votes to 2 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: WARN: match_down_event: No match > for shutdown action on cluster-node2 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_update_diff: > Stonith/shutdown of cluster-node2 not matched > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph: > te_update_diff:194 - Triggered transition abort (complete=1, tag=node_state, > id=cluster-node2, magic=NA, cib=0.30.31) : Node failure > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: abort_transition_graph: > need_abort:59 - Triggered transition abort (complete=1) : Non-status change > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: fence_node_time: Node > 2/cluster-node2 has not been shot yet > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: need_abort: Aborting on > change to have-quorum > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: check_fencing_done: > 17633D496670435F99A9C3A12F3FFFF0 check_fencing 2 not fenced add 1299490145 > fence 0 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > origin=abort_transition_graph ] > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: All 1 > cluster nodes are eligible to run resources. > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 637: > Requesting the current CIB: S_POLICY_ENGINE > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 638: > Requesting the current CIB: S_POLICY_ENGINE > Mar 7 10:32:10 cluster-node1 cib: [4309]: info: cib_process_request: > Operation complete: op cib_modify for section crm_config > (origin=local/crmd/636, version=0.31.1): ok (rc=0) > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback: > Invoking the PE: query=638, ref=pe_calc-dc-1299490330-545, seq=11840, > quorate=0 > Mar 7 10:32:10 cluster-node1 cluster-dlm[5081]: set_fs_notified: > 17633D496670435F99A9C3A12F3FFFF0 set_fs_notified nodeid 2 > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: message from dlmcontrol > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup > probes: enabled > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Notified for > "17633D496670435F99A9C3A12F3FFFF0", node 2, status 0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: unpack_config: On loss > of CCM Quorum: Ignore > Mar 7 10:32:10 cluster-node1 ocfs2_controld[5544]: Completing notification > on "17633D496670435F99A9C3A12F3FFFF0" for node 2 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_config: Node > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: unpack_domains: > Unpacking domains > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: determine_online_status: > Node cluster-node1 is online > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: > Master/Slave Set: ms_drbd > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Masters: [ cluster-node1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_drbd:1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_dlm > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_dlm:1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_o2cb > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_o2cb:1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_fs > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_fs:1 ] > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_drbd:1 cannot run anywhere > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_dlm:1 cannot run anywhere > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_o2cb:1 cannot run anywhere > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_fs:1 cannot run anywhere > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_dlm:0 with p_drbd:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_drbd:0 with p_dlm:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_drbd:0 with p_dlm:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_dlm:0 with p_drbd:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_o2cb:0 with p_dlm:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_dlm:0 with p_o2cb:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_dlm:0 with p_o2cb:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_o2cb:0 with p_dlm:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_fs:0 with p_o2cb:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_o2cb:0 with p_fs:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_o2cb:0 with p_fs:0 on cluster-node1 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_fs:0 with p_o2cb:0 > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_drbd:0 (Master cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_drbd:1 (Stopped) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_dlm:0 (Started cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_dlm:1 (Stopped) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_o2cb:0 (Started cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_o2cb:1 (Stopped) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_fs:0 (Started cluster-node1) > Mar 7 10:32:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_fs:1 (Stopped) > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked > transition 62: 0 actions in 0 synapses > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing > graph 62 (ref=pe_calc-dc-1299490330-545) derived from > /var/lib/pengine/pe-input-4730.bz2 > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: run_graph: > ==================================================== > Mar 7 10:32:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition 62 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-4730.bz2): Complete > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: > Transition 62 is now complete > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition 62 > status: done - <null> > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 7 10:32:10 cluster-node1 crmd: [4313]: info: do_state_transition: > Starting PEngine Recheck Timer > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: > Archived previous version as /var/lib/heartbeat/crm/cib-23.raw > Mar 7 10:32:10 cluster-node1 pengine: [4312]: info: process_pe_message: > Transition 62: PEngine Input stored in: /var/lib/pengine/pe-input-4730.bz2 > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: write_cib_contents: Wrote > version 0.31.0 of the CIB to disk (digest: 1360ea4c1e6d061a115b8efa6794189a) > Mar 7 10:32:10 cluster-node1 cib: [10261]: info: retrieveCib: Reading > cluster configuration from: /var/lib/heartbeat/crm/cib.ljbIv1 (digest: > /var/lib/heartbeat/crm/cib.HFfG5H) > Mar 7 10:35:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 660 > operations (2333.00us average, 0% utilization) in the last 10min > Mar 7 10:45:24 cluster-node1 cib: [4309]: info: cib_stats: Processed 629 > operations (651.00us average, 0% utilization) in the last 10min > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: crm_timer_popped: PEngine > Recheck Timer (I_PE_CALC) just popped! > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: All 1 > cluster nodes are eligible to run resources. > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke: Query 639: > Requesting the current CIB: S_POLICY_ENGINE > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_pe_invoke_callback: > Invoking the PE: query=639, ref=pe_calc-dc-1299491230-546, seq=11840, > quorate=0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Startup > probes: enabled > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: unpack_config: On loss > of CCM Quorum: Ignore > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_config: Node > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: unpack_domains: > Unpacking domains > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: determine_online_status: > Node cluster-node1 is online > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: > Master/Slave Set: ms_drbd > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Masters: [ cluster-node1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_drbd:1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_dlm > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_dlm:1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_o2cb > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_o2cb:1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_print: Clone > Set: cl_fs > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Started: [ cluster-node1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: short_print: > Stopped: [ p_fs:1 ] > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_drbd:1 cannot run anywhere > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: Promoting > p_drbd:0 (Master cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: master_color: ms_drbd: > Promoted 1 instances of a possible 2 to master > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_dlm:1 cannot run anywhere > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_o2cb:1 cannot run anywhere > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: native_color: Resource > p_fs:1 cannot run anywhere > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_dlm:0 with p_drbd:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_drbd:0 with p_dlm:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_drbd:0 with p_dlm:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_dlm:0 with p_drbd:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_o2cb:0 with p_dlm:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_dlm:0 with p_o2cb:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_dlm:0 with p_o2cb:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_o2cb:0 with p_dlm:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_fs:0 with p_o2cb:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_o2cb:0 with p_fs:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: find_compatible_child: > Colocating p_o2cb:0 with p_fs:0 on cluster-node1 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: clone_rsc_order_lh: > Interleaving p_fs:0 with p_o2cb:0 > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_drbd:0 (Master cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_drbd:1 (Stopped) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_dlm:0 (Started cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_dlm:1 (Stopped) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_o2cb:0 (Started cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_o2cb:1 (Stopped) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_fs:0 (Started cluster-node1) > Mar 7 10:47:10 cluster-node1 pengine: [4312]: notice: LogActions: Leave > resource p_fs:1 (Stopped) > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: unpack_graph: Unpacked > transition 63: 0 actions in 0 synapses > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_te_invoke: Processing > graph 63 (ref=pe_calc-dc-1299491230-546) derived from > /var/lib/pengine/pe-input-4731.bz2 > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: run_graph: > ==================================================== > Mar 7 10:47:10 cluster-node1 crmd: [4313]: notice: run_graph: Transition 63 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-4731.bz2): Complete > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: te_graph_trigger: > Transition 63 is now complete > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: notify_crmd: Transition 63 > status: done - <null> > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Mar 7 10:47:10 cluster-node1 crmd: [4313]: info: do_state_transition: > Starting PEngine Recheck Timer > Mar 7 10:47:10 cluster-node1 pengine: [4312]: info: process_pe_message: > Transition 63: PEngine Input stored in: /var/lib/pengine/pe-input-4731.bz2 > > Any help is appreciated. > > Thank you and kind regards, > Sascha Hagedorn > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems