On Thu, 2014-01-16 at 14:49 +1100, Andrew Beekhof wrote: > > What crm_mon are you looking at? > I see stuff like: > > virt-fencing (stonith:fence_xvm): Started rhos4-node3 > Resource Group: mysql-group > mysql-vip (ocf::heartbeat:IPaddr2): Started rhos4-node3 > mysql-fs (ocf::heartbeat:Filesystem): Started rhos4-node3 > mysql-db (ocf::heartbeat:mysql): Started rhos4-node3
Yes, you are right. I couldn't see the forest for the trees. I initially was optimistic about crm_mon being more truthful than crm_resource but it turns out it is not. Take for example these commands to set a constraint and start a resource (which has already been defined at this point): [21/Jan/2014:13:46:40] cibadmin -o constraints -C -X '<rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/>' [21/Jan/2014:13:46:41] cibadmin -o constraints -C -X '<rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/>' [21/Jan/2014:13:46:42] crm_resource -r 'res1' -p target-role -m -v 'Started' and then these repeated calls to crm_mon -1 on node5: [21/Jan/2014:13:46:42] crm_mon -1 Last updated: Tue Jan 21 13:46:42 2014 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 Stack: openais Current DC: node5 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 2 Resources configured Online: [ node5 node6 ] st-fencing (stonith:fence_product): Started node5 res1 (ocf::product:Target): Started node6 [21/Jan/2014:13:46:42] crm_mon -1 Last updated: Tue Jan 21 13:46:42 2014 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 Stack: openais Current DC: node5 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 2 Resources configured Online: [ node5 node6 ] st-fencing (stonith:fence_product): Started node5 res1 (ocf::product:Target): Started node6 [21/Jan/2014:13:46:49] crm_mon -1 -r Last updated: Tue Jan 21 13:46:49 2014 Last change: Tue Jan 21 13:46:42 2014 via crm_resource on node5 Stack: openais Current DC: node5 - partition with quorum Version: 1.1.10-14.el6_5.1-368c726 2 Nodes configured 2 Resources configured Online: [ node5 node6 ] Full list of resources: st-fencing (stonith:fence_product): Started node5 res1 (ocf::product:Target): Started node5 The first two are not correct, showing the resource started on node6 when it was actually started on node5. Finally, 7 seconds later, it is reporting correctly. The logs on node{5,6} bear this out. The resource was actually only ever started on node5 and never on node6. Here's the log for node5: Jan 21 13:42:00 node5 pacemaker: Starting Pacemaker Cluster Manager Jan 21 13:42:00 node5 pacemakerd[8684]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.1 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Jan 21 13:42:00 node5 pacemakerd[8684]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 stonith-ng[8691]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:42:00 node5 cib[8690]: notice: main: Using new config location: /var/lib/pacemaker/cib Jan 21 13:42:00 node5 cib[8690]: warning: retrieveCib: Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml Jan 21 13:42:00 node5 cib[8690]: warning: readCibXmlFile: Primary configuration corrupt or unusable, trying backups Jan 21 13:42:00 node5 cib[8690]: warning: readCibXmlFile: Continuing with an empty configuration. Jan 21 13:42:00 node5 attrd[8693]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:42:00 node5 crmd[8695]: notice: main: CRM Git Version: 368c726 Jan 21 13:42:00 node5 attrd[8693]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1cbc3c0 for attrd/0 Jan 21 13:42:00 node5 stonith-ng[8691]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1cb8040 for stonith-ng/0 Jan 21 13:42:00 node5 attrd[8693]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 stonith-ng[8691]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 attrd[8693]: notice: main: Starting mainloop... Jan 21 13:42:00 node5 cib[8690]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:42:00 node5 cib[8690]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1cc0740 for cib/0 Jan 21 13:42:00 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Sending membership update 4 to cib Jan 21 13:42:00 node5 cib[8690]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:00 node5 cib[8690]: notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null)) Jan 21 13:42:01 node5 crmd[8695]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:42:01 node5 crmd[8695]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:01 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1cc4c00 for crmd/0 Jan 21 13:42:01 node5 corosync[8646]: [pcmk ] info: pcmk_ipc: Sending membership update 4 to crmd Jan 21 13:42:01 node5 crmd[8695]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:42:01 node5 stonith-ng[8691]: notice: setup_cib: Watching for stonith topology changes Jan 21 13:42:01 node5 crmd[8695]: notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null)) Jan 21 13:42:01 node5 crmd[8695]: notice: do_started: The local CRM is operational Jan 21 13:42:01 node5 crmd[8695]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jan 21 13:42:22 node5 crmd[8695]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Jan 21 13:42:22 node5 crmd[8695]: notice: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Jan 21 13:42:22 node5 cib[8690]: notice: cib_perform_op: Configuration ordering change detected Jan 21 13:42:22 node5 stonith-ng[8691]: notice: log_cib_diff: cib_process_diff: Local-only Change: 0.1.1 Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: -- <cib num_updates="1"/> Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: ++ <cib epoch="1" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jan 21 13:42:01 2014" update-origin="node5" update-client="crmd"/> Jan 21 13:42:22 node5 stonith-ng[8691]: notice: cib_process_diff: Diff 0.1.1 -> 0.1.1 from local not applied to 0.1.1: + and - versions in the diff did not change Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: Diff: --- 0.1.2 Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.2.1 5ea115fd649f5fa3419b7f1c9abf8992 Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="1" num_updates="2"/> Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <cluster_property_set id="cib-bootstrap-options"> Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6_5.1-368c726"/> Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ </cluster_property_set> Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Jan 21 13:42:22 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.3.1 Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="2" num_updates="1"/> Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/> Jan 21 13:42:22 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.4.1 Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="3" num_updates="1"/> Jan 21 13:42:22 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/> Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed, requesting a full refresh (-207) Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jan 21 13:42:22 node5 pengine[8694]: warning: cluster_status: We do not have quorum - fencing and resource management disabled Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option Jan 21 13:42:22 node5 pengine[8694]: error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Jan 21 13:42:22 node5 pengine[8694]: notice: stage6: Delaying fencing operations until there are resources to manage Jan 21 13:42:22 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-71.bz2 Jan 21 13:42:22 node5 pengine[8694]: notice: process_pe_message: Configuration ERRORs found during PE processing. Please run "crm_verify -L" to identify issues. Jan 21 13:42:22 node5 crmd[8695]: notice: te_rsc_command: Initiating action 2: probe_complete probe_complete on node5 (local) - no waiting Jan 21 13:42:22 node5 crmd[8695]: notice: run_graph: Transition 0 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-71.bz2): Complete Jan 21 13:42:22 node5 crmd[8695]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 21 13:42:22 node5 stonith-ng[8691]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Jan 21 13:42:22 node5 attrd[8693]: notice: attrd_perform_update: Sent update 4: probe_complete=true Jan 21 13:42:36 node5 cibadmin[8745]: notice: crm_log_args: Invoked: cibadmin --delete --obj_type resources -X <primitive class="stonith" id="st-fencing" type="fence_product"/> Jan 21 13:42:36 node5 cibadmin[8746]: notice: crm_log_args: Invoked: cibadmin --create --obj_type resources -X <primitive class="stonith" id="st-fencing" type="fence_product"/> Jan 21 13:42:36 node5 crmd[8695]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: Diff: --- 0.4.4 Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.5.1 e1637d693af73b0a1ff6beeabaa16dbe Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="4" num_updates="4"/> Jan 21 13:42:36 node5 cib[8690]: notice: cib:diff: ++ <primitive class="stonith" id="st-fencing" type="fence_product"/> Jan 21 13:42:36 node5 pengine[8694]: warning: cluster_status: We do not have quorum - fencing and resource management disabled Jan 21 13:42:36 node5 pengine[8694]: notice: LogActions: Start st-fencing#011(node5 - blocked) Jan 21 13:42:36 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 1: /var/lib/pacemaker/pengine/pe-input-72.bz2 Jan 21 13:42:36 node5 crmd[8695]: notice: te_rsc_command: Initiating action 4: monitor st-fencing_monitor_0 on node5 (local) Jan 21 13:42:37 node5 cibadmin[8748]: notice: crm_log_args: Invoked: cibadmin --modify --allow-create -o crm_config -X <cluster_property_set id="cib-bootstrap-options">#012<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>#012<nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/>#012<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/>#012<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true" Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.6.1 Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <nvpair value="classic openais (with plugin)" id="cib-bootstrap-options-cluster-infrastructure"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="openais"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="true"/> Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.7.1 Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="6" num_updates="1"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <rsc_defaults> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <meta_attributes id="rsc_defaults-options"> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="rsc_defaults-options-resource-stickiness" name="resource-stickiness" value="1000"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ </meta_attributes> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ </rsc_defaults> Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.8.1 Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="7" num_updates="1"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="rsc_defaults-options-failure-timeout" name="failure-timeout" value="20m"/> Jan 21 13:42:37 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.9.1 Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="8" num_updates="1"/> Jan 21 13:42:37 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="rsc_defaults-options-migration-threshold" name="migration-threshold" value="3"/> Jan 21 13:42:37 node5 stonith-ng[8691]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Jan 21 13:42:39 node5 crmd[8695]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on node5 (local) - no waiting Jan 21 13:42:39 node5 crmd[8695]: notice: run_graph: Transition 1 (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-72.bz2): Stopped Jan 21 13:42:39 node5 pengine[8694]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:42:39 node5 pengine[8694]: notice: LogActions: Start st-fencing#011(node5) Jan 21 13:42:39 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 2: /var/lib/pacemaker/pengine/pe-input-73.bz2 Jan 21 13:42:39 node5 crmd[8695]: notice: te_rsc_command: Initiating action 4: start st-fencing_start_0 on node5 (local) Jan 21 13:42:40 node5 stonith-ng[8691]: notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices) Jan 21 13:42:40 node5 crmd[8695]: notice: process_lrm_event: LRM operation st-fencing_start_0 (call=8, rc=0, cib-update=42, confirmed=true) ok Jan 21 13:42:40 node5 crmd[8695]: notice: run_graph: Transition 2 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-73.bz2): Complete Jan 21 13:42:40 node5 crmd[8695]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 21 13:45:02 node5 ntpd[8470]: 0.0.0.0 c614 04 freq_mode Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 8: memb=1, new=0, lost=0 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: memb: node5 307367434 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 8: memb=2, new=1, lost=0 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Creating entry for node 290590218 born on 8 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Node 290590218/unknown is now: member Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: NEW: .pending. 290590218 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: MEMB: .pending. 290590218 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: pcmk_peer_update: MEMB: node5 307367434 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: send_member_notification: Sending membership update 8 to 2 children Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1ca5c80 Node 307367434 ((null)) born on: 4 Jan 21 13:46:23 node5 corosync[8646]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jan 21 13:46:23 node5 cib[8690]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[290590218] - state is now member (was (null)) Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1cc9e50 Node 290590218 (node6) born on: 8 Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: 0x1cc9e50 Node 290590218 now known as node6 (was: (null)) Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: update_member: Node node6 now has 1 quorum votes (was 0) Jan 21 13:46:23 node5 corosync[8646]: [pcmk ] info: send_member_notification: Sending membership update 8 to 2 children Jan 21 13:46:23 node5 crmd[8695]: notice: crm_update_peer_state: plugin_handle_membership: Node (null)[290590218] - state is now member (was (null)) Jan 21 13:46:23 node5 cib[8690]: notice: plugin_handle_membership: Membership 8: quorum acquired Jan 21 13:46:23 node5 corosync[8646]: [CPG ] chosen downlist: sender r(0) ip(10.14.82.17) r(1) ip(10.128.2.17) ; members(old:1 left:0) Jan 21 13:46:23 node5 corosync[8646]: [MAIN ] Completed service synchronization, ready to provide service. Jan 21 13:46:23 node5 crmd[8695]: notice: plugin_handle_membership: Membership 8: quorum acquired Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: Diff: --- 0.9.3 Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.10.1 7be5e06f43347759896f381ff0e09476 Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="9" num_updates="3"/> Jan 21 13:46:23 node5 cib[8690]: notice: cib:diff: ++ <node id="node6" uname="node6"/> Jan 21 13:46:23 node5 crmd[8695]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 21 13:46:30 node5 attrd[8693]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jan 21 13:46:30 node5 attrd[8693]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 21 13:46:31 node5 pengine[8694]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:31 node5 crmd[8695]: notice: te_rsc_command: Initiating action 5: monitor st-fencing_monitor_0 on node6 Jan 21 13:46:31 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 3: /var/lib/pacemaker/pengine/pe-input-74.bz2 Jan 21 13:46:32 node5 crmd[8695]: notice: te_rsc_command: Initiating action 4: probe_complete probe_complete on node6 - no waiting Jan 21 13:46:32 node5 crmd[8695]: notice: run_graph: Transition 3 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-74.bz2): Complete Jan 21 13:46:32 node5 crmd[8695]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 21 13:46:40 node5 cibadmin[9025]: notice: crm_log_args: Invoked: cibadmin -o resources -C -x /tmp/tmp80fPna Jan 21 13:46:40 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:40 node5 crmd[8695]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: Diff: --- 0.10.10 Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.11.1 50c098c8e9b5f371c281ae77c05120e2 Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="10" num_updates="10"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <primitive class="ocf" provider="product" type="Target" id="res1"> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <meta_attributes id="res1-meta_attributes"> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <nvpair name="target-role" id="res1-meta_attributes-target-role" value="Stopped"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </meta_attributes> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <operations id="res1-operations"> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op id="res1-monitor-5" interval="5" name="monitor" timeout="60"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op id="res1-start-0" interval="0" name="start" timeout="300"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <op id="res1-stop-0" interval="0" name="stop" timeout="300"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </operations> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <instance_attributes id="res1-instance_attributes"> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <nvpair id="res1-instance_attributes-target" name="target" value="63e7a581-08fd-4a6c-bcb8-153469bd36a2"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </instance_attributes> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ </primitive> Jan 21 13:46:40 node5 pengine[8694]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:40 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 4: /var/lib/pacemaker/pengine/pe-input-75.bz2 Jan 21 13:46:40 node5 crmd[8695]: notice: te_rsc_command: Initiating action 6: monitor res1_monitor_0 on node6 Jan 21 13:46:40 node5 crmd[8695]: notice: te_rsc_command: Initiating action 4: monitor res1_monitor_0 on node5 (local) Jan 21 13:46:40 node5 cibadmin[9028]: notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/> Jan 21 13:46:40 node5 cib[8690]: notice: log_cib_diff: cib:diff: Local-only Change: 0.12.1 Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="11" num_updates="1"/> Jan 21 13:46:40 node5 cib[8690]: notice: cib:diff: ++ <rsc_location id="res1-primary" node="node5" rsc="res1" score="20"/> Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices) Jan 21 13:46:41 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:41 node5 kernel: LDISKFS-fs (sde): mounted filesystem with ordered data mode. quota=on. Opts: Jan 21 13:46:41 node5 crmd[8695]: warning: status_from_rc: Action 6 (res1_monitor_0) on node6 failed (target: 7 vs. rc: 0): Error Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action 5: probe_complete probe_complete on node6 - no waiting Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices) Jan 21 13:46:41 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: Diff: --- 0.12.2 Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.13.1 48a53642acc94d5f541c3297baf08880 Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: -- <cib admin_epoch="0" epoch="12" num_updates="2"/> Jan 21 13:46:41 node5 cib[8690]: notice: cib:diff: ++ <rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/> Jan 21 13:46:41 node5 stonith-ng[8691]: notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices) Jan 21 13:46:41 node5 crmd[8695]: notice: process_lrm_event: LRM operation res1_monitor_0 (call=13, rc=7, cib-update=68, confirmed=true) not running Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action 3: probe_complete probe_complete on node5 (local) - no waiting Jan 21 13:46:41 node5 crmd[8695]: notice: run_graph: Transition 4 (Complete=4, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-75.bz2): Stopped Jan 21 13:46:41 node5 pengine[8694]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:41 node5 pengine[8694]: notice: LogActions: Stop res1#011(node6) Jan 21 13:46:41 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 5: /var/lib/pacemaker/pengine/pe-input-76.bz2 Jan 21 13:46:41 node5 crmd[8695]: notice: te_rsc_command: Initiating action 7: stop res1_stop_0 on node6 Jan 21 13:46:42 node5 stonith-ng[8691]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: Diff: --- 0.13.2 Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: Diff: +++ 0.14.1 36575d506c0ed5769e4655c88ef3510d Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: -- <nvpair value="Stopped" id="res1-meta_attributes-target-role"/> Jan 21 13:46:42 node5 cib[8690]: notice: cib:diff: ++ <nvpair name="target-role" id="res1-meta_attributes-target-role" value="Started"/> Jan 21 13:46:42 node5 crmd[8695]: notice: run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-76.bz2): Stopped Jan 21 13:46:42 node5 pengine[8694]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:42 node5 pengine[8694]: notice: LogActions: Start res1#011(node5) Jan 21 13:46:42 node5 pengine[8694]: notice: process_pe_message: Calculated Transition 6: /var/lib/pacemaker/pengine/pe-input-77.bz2 Jan 21 13:46:42 node5 crmd[8695]: notice: te_rsc_command: Initiating action 7: start res1_start_0 on node5 (local) Jan 21 13:46:42 node5 stonith-ng[8691]: notice: stonith_device_register: Device 'st-fencing' already existed in device list (1 active devices) Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ [ ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ { ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "args": [ ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "mount", ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "-t", ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "ext4", ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_disk1", ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "/mnt/res1" ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ ], ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "rc": 0, ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "stderr": "", ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ "stdout": "" ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ } ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ ] ] Jan 21 13:46:43 node5 lrmd[8692]: notice: operation_finished: res1_start_0:9059:stderr [ ] Jan 21 13:46:43 node5 crmd[8695]: notice: process_lrm_event: LRM operation res1_start_0 (call=16, rc=0, cib-update=71, confirmed=true) ok Jan 21 13:46:43 node5 crmd[8695]: notice: te_rsc_command: Initiating action 8: monitor res1_monitor_5000 on node5 (local) Jan 21 13:46:43 node5 crmd[8695]: notice: process_lrm_event: LRM operation res1_monitor_5000 (call=19, rc=0, cib-update=72, confirmed=false) ok Jan 21 13:46:43 node5 crmd[8695]: notice: run_graph: Transition 6 (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-77.bz2): Complete Jan 21 13:46:43 node5 crmd[8695]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] and here's the log for node6: Jan 21 13:46:26 node6 pacemaker: Starting Pacemaker Cluster Manager Jan 21 13:46:26 node6 pacemakerd[9322]: notice: main: Starting Pacemaker 1.1.10-14.el6_5.1 (Build: 368c726): generated-manpages agent-manpages ascii-docs publican-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman Jan 21 13:46:26 node6 pacemakerd[9322]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.stonith-ng failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: ipc delivery failed (rc=-2) Jan 21 13:46:26 node6 stonith-ng[9329]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:46:26 node6 attrd[9331]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:46:26 node6 crmd[9333]: notice: main: CRM Git Version: 368c726 Jan 21 13:46:26 node6 cib[9328]: notice: main: Using new config location: /var/lib/pacemaker/cib Jan 21 13:46:26 node6 cib[9328]: warning: retrieveCib: Cluster configuration not found: /var/lib/pacemaker/cib/cib.xml Jan 21 13:46:26 node6 cib[9328]: warning: readCibXmlFile: Primary configuration corrupt or unusable, trying backups Jan 21 13:46:26 node6 cib[9328]: warning: readCibXmlFile: Continuing with an empty configuration. Jan 21 13:46:26 node6 attrd[9331]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1b1fe50 for attrd/0 Jan 21 13:46:26 node6 stonith-ng[9329]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1b1bad0 for stonith-ng/0 Jan 21 13:46:26 node6 attrd[9331]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 stonith-ng[9329]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 attrd[9331]: notice: main: Starting mainloop... Jan 21 13:46:26 node6 cib[9328]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:46:26 node6 cib[9328]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1b241d0 for cib/0 Jan 21 13:46:26 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Sending membership update 8 to cib Jan 21 13:46:26 node6 cib[9328]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:26 node6 cib[9328]: notice: plugin_handle_membership: Membership 8: quorum acquired Jan 21 13:46:26 node6 cib[9328]: notice: crm_update_peer_state: plugin_handle_membership: Node node6[290590218] - state is now member (was (null)) Jan 21 13:46:26 node6 cib[9328]: notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null)) Jan 21 13:46:27 node6 crmd[9333]: notice: crm_cluster_connect: Connecting to cluster infrastructure: classic openais (with plugin) Jan 21 13:46:27 node6 crmd[9333]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:27 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Recorded connection 0x1b28550 for crmd/0 Jan 21 13:46:27 node6 corosync[9288]: [pcmk ] info: pcmk_ipc: Sending membership update 8 to crmd Jan 21 13:46:27 node6 crmd[9333]: notice: get_node_name: Defaulting to uname -n for the local classic openais (with plugin) node name Jan 21 13:46:27 node6 crmd[9333]: notice: plugin_handle_membership: Membership 8: quorum acquired Jan 21 13:46:27 node6 crmd[9333]: notice: crm_update_peer_state: plugin_handle_membership: Node node6[290590218] - state is now member (was (null)) Jan 21 13:46:27 node6 crmd[9333]: notice: crm_update_peer_state: plugin_handle_membership: Node node5[307367434] - state is now member (was (null)) Jan 21 13:46:27 node6 stonith-ng[9329]: notice: setup_cib: Watching for stonith topology changes Jan 21 13:46:27 node6 crmd[9333]: notice: do_started: The local CRM is operational Jan 21 13:46:27 node6 crmd[9333]: notice: do_state_transition: State transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_started ] Jan 21 13:46:29 node6 stonith-ng[9329]: warning: cib_process_diff: Diff 0.1.1 -> 0.10.4 from local not applied to 0.1.1: Failed application of an update diff Jan 21 13:46:29 node6 stonith-ng[9329]: notice: update_cib_cache_cb: [cib_diff_notify] Patch aborted: Application of an update diff failed (-206) Jan 21 13:46:29 node6 crmd[9333]: notice: do_state_transition: State transition S_PENDING -> S_NOT_DC [ input=I_NOT_DC cause=C_HA_MESSAGE origin=do_cl_join_finalize_respond ] Jan 21 13:46:29 node6 attrd[9331]: notice: attrd_local_callback: Sending full refresh (origin=crmd) Jan 21 13:46:29 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:30 node6 stonith-ng[9329]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Jan 21 13:46:32 node6 attrd[9331]: notice: attrd_trigger_update: Sending flush op to all hosts for: probe_complete (true) Jan 21 13:46:32 node6 attrd[9331]: notice: attrd_perform_update: Sent update 7: probe_complete=true Jan 21 13:46:40 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:41 node6 crmd[9333]: notice: process_lrm_event: LRM operation res1_monitor_0 (call=10, rc=0, cib-update=10, confirmed=true) ok Jan 21 13:46:41 node6 cibadmin[9408]: notice: crm_log_args: Invoked: cibadmin -o constraints -C -X <rsc_location id="res1-secondary" node="node6" rsc="res1" score="10"/> Jan 21 13:46:41 node6 stonith-ng[9329]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Jan 21 13:46:41 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:42 node6 crmd[9333]: notice: process_lrm_event: LRM operation res1_stop_0 (call=13, rc=0, cib-update=11, confirmed=true) ok Jan 21 13:46:42 node6 stonith-ng[9329]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Jan 21 13:46:42 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:43 node6 stonith-ng[9329]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Jan 21 13:46:43 node6 stonith-ng[9329]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 21 13:46:44 node6 stonith-ng[9329]: notice: stonith_device_register: Added 'st-fencing' to the device list (1 active devices) Thoughts? b. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org