Ok Sunday morning it's not good to collect information logs are in /var/log/cluster directory on both nodes these are for secondary node ga2-ext
Sep 22 00:45:39 corosync [TOTEM ] Process pause detected for 20442 ms, flushing membership messages. Sep 22 00:45:45 corosync [CMAN ] quorum lost, blocking activity Sep 22 00:45:45 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services. Sep 22 00:45:45 corosync [QUORUM] Members[1]: 2 Sep 22 00:45:45 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 22 00:45:45 corosync [CMAN ] quorum regained, resuming activity Sep 22 00:45:45 corosync [QUORUM] This node is within the primary component and will provide service. Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2 Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2 Sep 22 00:45:46 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.2) ; members(old:2 left:1) Sep 22 00:45:46 corosync [MAIN ] Completed service synchronization, ready to provide service. Sep 22 00:45:46 [11432] ga2-ext cib: info: pcmk_cpg_membership: Left[1.0] cib.1 Sep 22 00:45:46 [11433] ga2-ext stonith-ng: info: pcmk_cpg_membership: Left[1.0] stonith-ng.1 Sep 22 00:45:47 [11432] ga2-ext cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline Sep 22 00:45:47 [11432] ga2-ext cib: info: pcmk_cpg_membership: Member[1.0] cib.2 Sep 22 00:45:47 [11432] ga2-ext cib: info: pcmk_cpg_membership: Joined[2.0] cib.1 Sep 22 00:45:47 [11432] ga2-ext cib: info: pcmk_cpg_membership: Member[2.0] cib.1 Sep 22 00:45:47 [11432] ga2-ext cib: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online Sep 22 00:45:47 [11432] ga2-ext cib: info: pcmk_cpg_membership: Member[2.1] cib.2 Sep 22 00:45:47 [11433] ga2-ext stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline Sep 22 00:45:48 [11432] ga2-ext cib: info: cib_process_diff: Diff 0.143.13 -> 0.143.14 from ga1-ext not applied to 0.143.9: current "num_updates" is less than required Sep 22 00:45:48 [11432] ga2-ext cib: info: cib_server_process_diff: Requesting re-sync from peer Sep 22 00:45:48 [11432] ga2-ext cib: notice: cib_server_process_diff: Not applying diff 0.143.14 -> 0.143.15 (sync in progress) Sep 22 00:45:48 [11432] ga2-ext cib: notice: cib_server_process_diff: Not applying diff 0.143.15 -> 0.143.16 (sync in progress) Sep 22 00:45:48 [11432] ga2-ext cib: notice: cib_server_process_diff: Not applying diff 0.143.16 -> 0.143.17 (sync in progress) Sep 22 00:45:48 [11433] ga2-ext stonith-ng: info: pcmk_cpg_membership: Member[1.0] stonith-ng.2 Sep 22 00:45:48 [11433] ga2-ext stonith-ng: info: pcmk_cpg_membership: Joined[2.0] stonith-ng.1 Sep 22 00:45:48 [11433] ga2-ext stonith-ng: info: pcmk_cpg_membership: Member[2.0] stonith-ng.1 Sep 22 00:45:48 [11433] ga2-ext stonith-ng: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online Sep 22 00:45:48 [11433] ga2-ext stonith-ng: info: pcmk_cpg_membership: Member[2.1] stonith-ng.2 Sep 22 00:45:48 [11432] ga2-ext cib: info: cib_process_replace: Digest matched on replace from ga1-ext: 00be365e16e96092747ee3d8acc74e7b Sep 22 00:45:48 [11432] ga2-ext cib: info: cib_process_replace: Replaced 0.143.9 with 0.143.17 from ga1-ext Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_replace_notify: Replaced: 0.143.9 -> 0.143.17 from ga1-ext Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_replace operation for section 'all': OK (rc=0, origin=ga1-ext/ga2-ext/(null), version=0.143.17) Sep 22 00:45:49 [11432] ga2-ext cib: info: crm_client_new: Connecting 0x23c9cd0 for uid=0 gid=0 pid=24772 id=3e16081f-1b7b-4fb7-ab3a-e47dffb68615 Sep 22 00:45:48 [11437] ga2-ext crmd: notice: cman_event_callback: Membership 904: quorum lost Sep 22 00:45:49 [11437] ga2-ext crmd: notice: crm_update_peer_state: cman_event_callback: Node ga1-ext[1] - state is now lost (was member) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_query operation for section nodes: OK (rc=0, origin=local/crm_attribute/2, version=0.143.17) Sep 22 00:45:49 [11437] ga2-ext crmd: info: peer_update_callback: ga1-ext is now lost (was member) Sep 22 00:45:49 [11437] ga2-ext crmd: warning: reap_dead_nodes: Our DC node (ga1-ext) left the cluster Sep 22 00:45:49 [11437] ga2-ext crmd: notice: cman_event_callback: Membership 904: quorum acquired Sep 22 00:45:49 [11437] ga2-ext crmd: warning: reap_dead_nodes: Our DC node (ga1-ext) left the cluster Sep 22 00:45:49 [11437] ga2-ext crmd: info: cman_event_callback: Membership 904: quorum retained Sep 22 00:45:49 [11437] ga2-ext crmd: warning: reap_dead_nodes: Our DC node (ga1-ext) left the cluster Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='shutdown']: No such device or address (rc=-6, origin=local/attrd/18, version=0.143.17) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='terminate']: No such device or address (rc=-6, origin=local/attrd/19, version=0.143.17) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='probe_complete']: OK (rc=0, origin=local/attrd/20, version=0.143.17) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/21) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='master-drbd0']: OK (rc=0, origin=local/attrd/22, version=0.143.17) Sep 22 00:45:49 [11432] ga2-ext cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/attrd/23) Sep 22 00:45:49 [11432] ga2-ext cib: info: crm_client_destroy: Destroying 0 events Sep 22 00:45:49 [11437] ga2-ext crmd: notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=reap_dead_nodes ] Sep 22 00:45:49 [11437] ga2-ext crmd: info: update_dc: Unset DC. Was ga1-ext Sep 22 00:45:50 [11437] ga2-ext crmd: info: pcmk_cpg_membership: Left[1.0] crmd.1 Sep 22 00:45:50 [11437] ga2-ext crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline Sep 22 00:45:50 [11437] ga2-ext crmd: info: peer_update_callback: Client ga1-ext/peer now has status [offline] (DC=<null>) Sep 22 00:45:50 [11437] ga2-ext crmd: info: pcmk_cpg_membership: Member[1.0] crmd.2 Sep 22 00:45:50 [11432] ga2-ext cib: info: write_cib_contents: Archived previous version as /var/lib/heartbeat/crm/cib-48.raw Sep 22 00:45:50 [11437] ga2-ext crmd: info: pcmk_cpg_membership: Joined[2.0] crmd.1 Sep 22 00:45:50 [11437] ga2-ext crmd: info: pcmk_cpg_membership: Member[2.0] crmd.1 Sep 22 00:45:50 [11437] ga2-ext crmd: info: crm_update_peer_proc: pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online Sep 22 00:45:50 [11437] ga2-ext crmd: info: peer_update_callback: Client ga1-ext/peer now has status [online] (DC=<null>) Sep 22 00:45:50 [11437] ga2-ext crmd: info: pcmk_cpg_membership: Member[2.1] crmd.2 Sep 22 00:45:50 [11437] ga2-ext crmd: warning: do_log: FSA: Input I_JOIN_OFFER from route_message() received in state S_ELECTION Sep 22 00:45:50 [11432] ga2-ext cib: info: write_cib_contents: Wrote version 0.143.0 of the CIB to disk (digest: 1d6de77608199fb31f552e22d5d0f708) Sep 22 00:45:50 [11432] ga2-ext cib: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.EqQlyt (digest: /var/lib/heartbeat/crm/cib.kNjefQ) Sep 22 00:45:58 corosync [TOTEM ] A processor failed, forming new configuration. Sep 22 00:46:00 corosync [CMAN ] quorum lost, blocking activity Sep 22 00:46:00 corosync [QUORUM] This node is within the non-primary component and will NOT provide any services. Sep 22 00:46:00 corosync [QUORUM] Members[1]: 2 Sep 22 00:46:00 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Sep 22 00:46:00 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.2) ; members(old:2 left:1) Sep 22 00:46:00 corosync [MAIN ] Completed service synchronization, ready to provide service. /var/log/cluster/fenced.log on primary Sep 22 00:45:48 fenced cluster is down, exiting Sep 22 00:45:48 fenced daemon cpg_dispatch error 2 Sep 22 00:45:48 fenced cpg_dispatch error 2 /var/log/cluster/dlm_controld.log on primary Sep 22 00:45:48 dlm_controld cluster is down, exiting Sep 22 00:45:48 dlm_controld daemon cpg_dispatch error 2 On Sun, 22 Sep 2013 08:21:18 +0000, Alessandro Bono wrote: > Found logs in corosync(!?) log directory > > these are for primary node ga1-ext > > Sep 22 00:45:29 corosync [TOTEM ] A processor failed, forming new > configuration. > Sep 22 00:45:31 corosync [CMAN ] quorum lost, blocking activity > Sep 22 00:45:31 corosync [QUORUM] This node is within the non-primary > component and will NOT provide any services. > Sep 22 00:45:31 corosync [QUORUM] Members[1]: 1 > Sep 22 00:45:31 corosync [TOTEM ] A processor joined or left the membership > and a new membership was formed. > Sep 22 00:45:31 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.1) > ; members(old:2 left:1) > Sep 22 00:45:31 corosync [MAIN ] Completed service synchronization, ready to > provide service. > Sep 22 00:45:31 [4418] ga1-ext cib: info: pcmk_cpg_membership: > Left[4.0] cib.2 > Sep 22 00:45:31 [4418] ga1-ext cib: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline > Sep 22 00:45:31 [4418] ga1-ext cib: info: pcmk_cpg_membership: > Member[4.0] cib.1 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: pcmk_cpg_membership: > Left[4.0] crmd.2 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline > Sep 22 00:45:31 [4423] ga1-ext crmd: info: peer_update_callback: > Client ga2-ext/peer now has status [offline] (DC=true) > Sep 22 00:45:31 [4423] ga1-ext crmd: warning: match_down_event: No > match for shutdown action on ga2-ext > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: peer_update_callback: > Stonith/shutdown of ga2-ext not matched > Sep 22 00:45:31 [4423] ga1-ext crmd: info: crm_update_peer_join: > peer_update_callback: Node ga2-ext[2] - join-2 phase 4 -> 0 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: > peer_update_callback:214 - Triggered transition abort (complete=1) : Node > failure > Sep 22 00:45:31 [4423] ga1-ext crmd: info: pcmk_cpg_membership: > Member[4.0] crmd.1 > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: cman_event_callback: > Membership 900: quorum lost > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: crm_update_peer_state: > cman_event_callback: Node ga2-ext[2] - state is now lost (was member) > Sep 22 00:45:31 [4423] ga1-ext crmd: info: peer_update_callback: > ga2-ext is now lost (was member) > Sep 22 00:45:31 [4423] ga1-ext crmd: warning: match_down_event: No > match for shutdown action on ga2-ext > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: peer_update_callback: > Stonith/shutdown of ga2-ext not matched > Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: > peer_update_callback:214 - Triggered transition abort (complete=1) : Node > failure > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: do_state_transition: > State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN > cause=C_FSA_INTERNAL origin=check_join_state ] > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_dc_join_offer_one: > An unknown node joined - (re-)offer to any unconfirmed nodes > Sep 22 00:45:31 [4423] ga1-ext crmd: info: join_make_offer: > Making join offers based on membership 900 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: join_make_offer: > Skipping ga1-ext: already known 4 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: > do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: > State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Sep 22 00:45:31 [4423] ga1-ext crmd: info: crmd_join_phase_log: > join-2: ga2-ext=none > Sep 22 00:45:31 [4423] ga1-ext crmd: info: crmd_join_phase_log: > join-2: ga1-ext=confirmed > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: > State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED > cause=C_FSA_INTERNAL origin=check_join_state ] > Sep 22 00:45:31 [4423] ga1-ext crmd: info: abort_transition_graph: > do_te_invoke:151 - Triggered transition abort (complete=1) : Peer Cancelled > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/703, version=0.143.10) > Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: > Left[3.0] stonith-ng.2 > Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/704, version=0.143.10) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section cib: OK (rc=0, > origin=local/crmd/705, version=0.143.11) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=local/crmd/706, version=0.143.11) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/707, version=0.143.12) > Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_local_callback: > Sending full refresh (origin=crmd) > Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_trigger_update: > Sending flush op to all hosts for: probe_complete (true) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=local/crmd/708, version=0.143.12) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/709, version=0.143.13) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section cib: OK (rc=0, > origin=local/crmd/710, version=0.143.13) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_query operation for section 'all': OK (rc=0, > origin=local/crmd/711, version=0.143.13) > Sep 22 00:45:31 [4422] ga1-ext pengine: notice: unpack_config: On > loss of CCM Quorum: Ignore > Sep 22 00:45:31 [4422] ga1-ext pengine: info: determine_online_status: > Node ga1-ext is online > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource dovecot active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: notice: unpack_rsc_op: > Operation monitor found resource drbd0:0 active in master mode on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource ClusterIP active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource mail active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource mysql active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource drbdlinks active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: unpack_rsc_op: > Operation monitor found resource SharedFS active on ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: clone_print: > Master/Slave Set: ms_drbd0 [drbd0] > Sep 22 00:45:31 [4422] ga1-ext pengine: info: short_print: > Masters: [ ga1-ext ] > Sep 22 00:45:31 [4422] ga1-ext pengine: info: short_print: > Stopped: [ ga2-ext ] > Sep 22 00:45:31 [4422] ga1-ext pengine: info: group_print: > Resource Group: service_group > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > SharedFS (ocf::heartbeat:Filesystem): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > drbdlinks (ocf::tummy:drbdlinks): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > ClusterIP (ocf::heartbeat:IPaddr): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > mail (ocf::heartbeat:MailTo): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > mysql (lsb:mysqld): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_print: > dovecot (lsb:dovecot): Started ga1-ext > Sep 22 00:45:31 [4422] ga1-ext pengine: info: native_color: > Resource drbd0:1 cannot run anywhere > Sep 22 00:45:31 [4422] ga1-ext pengine: info: master_color: > Promoting drbd0:0 (Master ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: master_color: > ms_drbd0: Promoted 1 instances of a possible 1 to master > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > drbd0:0 (Master ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > drbd0:1 (Stopped) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > SharedFS (Started ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > drbdlinks (Started ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > ClusterIP (Started ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > mail (Started ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > mysql (Started ga1-ext) > Sep 22 00:45:31 [4422] ga1-ext pengine: info: LogActions: Leave > dovecot (Started ga1-ext) > Sep 22 00:45:31 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: > Member[3.0] stonith-ng.1 > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_query operation for section > //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='probe_complete']: > OK (rc=0, origin=local/attrd/51, version=0.143.13) > Sep 22 00:45:31 [4421] ga1-ext attrd: notice: attrd_trigger_update: > Sending flush op to all hosts for: master-drbd0 (10000) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/attrd/52, version=0.143.13) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_query operation for section > //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='master-drbd0']: > OK (rc=0, origin=local/attrd/53, version=0.143.13) > Sep 22 00:45:31 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/attrd/54, version=0.143.13) > Sep 22 00:45:31 [4422] ga1-ext pengine: notice: process_pe_message: > Calculated Transition 621: /var/lib/pacemaker/pengine/pe-input-288.bz2 > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_state_transition: > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ > input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_te_invoke: > Processing graph 621 (ref=pe_calc-dc-1379803531-659) derived from > /var/lib/pacemaker/pengine/pe-input-288.bz2 > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: run_graph: > Transition 621 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-288.bz2): Complete > Sep 22 00:45:31 [4423] ga1-ext crmd: info: do_log: FSA: Input > I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE > Sep 22 00:45:31 [4423] ga1-ext crmd: notice: do_state_transition: > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Sep 22 00:45:45 corosync [TOTEM ] A processor joined or left the membership > and a new membership was formed. > Sep 22 00:45:45 corosync [CMAN ] quorum regained, resuming activity > Sep 22 00:45:45 corosync [QUORUM] This node is within the primary component > and will provide service. > Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2 > Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2 > Sep 22 00:45:45 [4423] ga1-ext crmd: notice: cman_event_callback: > Membership 904: quorum acquired > Sep 22 00:45:45 [4423] ga1-ext crmd: notice: crm_update_peer_state: > cman_event_callback: Node ga2-ext[2] - state is now member (was lost) > Sep 22 00:45:45 [4423] ga1-ext crmd: info: peer_update_callback: > ga2-ext is now member (was lost) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/712, version=0.143.14) > Sep 22 00:45:45 [4423] ga1-ext crmd: info: crm_cs_flush: Sent > 0 CPG messages (1 remaining, last=37): Try again (6) > Sep 22 00:45:45 [4423] ga1-ext crmd: info: cman_event_callback: > Membership 904: quorum retained > Sep 22 00:45:45 [4418] ga1-ext cib: info: crm_cs_flush: Sent > 0 CPG messages (1 remaining, last=64): Try again (6) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section cib: OK (rc=0, > origin=local/crmd/713, version=0.143.15) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=local/crmd/714, version=0.143.15) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/715, version=0.143.16) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section nodes: OK (rc=0, > origin=local/crmd/716, version=0.143.16) > Sep 22 00:45:45 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/717, version=0.143.16) > Sep 22 00:45:45 [4423] ga1-ext crmd: info: crm_cs_flush: Sent > 0 CPG messages (2 remaining, last=37): Try again (6) > Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_cs_flush: Sent > 0 CPG messages (3 remaining, last=64): Try again (6) > Sep 22 00:45:46 corosync [CPG ] chosen downlist: sender r(0) ip(10.12.23.1) > ; members(old:1 left:0) > Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_cs_flush: Sent > 0 CPG messages (2 remaining, last=37): Try again (6) > Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: > Joined[4.0] stonith-ng.2 > Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: > Member[4.0] stonith-ng.1 > Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: pcmk_cpg_membership: > Member[4.1] stonith-ng.2 > Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online > Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: > Joined[5.0] crmd.2 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: > Member[5.0] crmd.1 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: pcmk_cpg_membership: > Member[5.1] crmd.2 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online > Sep 22 00:45:46 [4423] ga1-ext crmd: info: peer_update_callback: > Client ga2-ext/peer now has status [online] (DC=true) > Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: > Joined[5.0] cib.2 > Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: > Member[5.0] cib.1 > Sep 22 00:45:46 [4418] ga1-ext cib: info: pcmk_cpg_membership: > Member[5.1] cib.2 > Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_update_peer_proc: > pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online > Sep 22 00:45:46 [4423] ga1-ext crmd: notice: do_state_transition: > State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN > cause=C_FSA_INTERNAL origin=peer_update_callback ] > Sep 22 00:45:46 [4423] ga1-ext crmd: info: do_dc_join_offer_one: > An unknown node joined - (re-)offer to any unconfirmed nodes > Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: > Making join offers based on membership 904 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: > join-2: Sending offer to ga2-ext > Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_update_peer_join: > join_make_offer: Node ga2-ext[2] - join-2 phase 0 -> 1 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: join_make_offer: > Skipping ga1-ext: already known 4 > Sep 22 00:45:46 [4423] ga1-ext crmd: info: abort_transition_graph: > do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt > Sep 22 00:45:46 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_modify operation for section status: OK (rc=0, > origin=local/crmd/718, version=0.143.17) > Sep 22 00:45:46 [4419] ga1-ext stonith-ng: info: crm_cs_flush: Sent > 0 CPG messages (1 remaining, last=5): Try again (6) > Sep 22 00:45:46 corosync [MAIN ] Completed service synchronization, ready to > provide service. > Sep 22 00:45:46 [4412] ga1-ext pacemakerd: info: crm_cs_flush: Sent > 0 CPG messages (1 remaining, last=9): Try again (6) > Sep 22 00:45:46 [4418] ga1-ext cib: info: crm_cs_flush: Sent > 4 CPG messages (0 remaining, last=68): OK (1) > Sep 22 00:45:46 [4423] ga1-ext crmd: info: crm_cs_flush: Sent > 3 CPG messages (0 remaining, last=40): OK (1) > Sep 22 00:45:48 [4418] ga1-ext cib: info: cib_process_request: > Completed cib_sync_one operation for section 'all': OK (rc=0, > origin=ga2-ext/ga2-ext/(null), version=0.143.17) > Sep 22 00:45:48 [4412] ga1-ext pacemakerd: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: Library error (2) > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: Library error (2) > Sep 22 00:45:48 [4421] ga1-ext attrd: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: Library error (2) > Sep 22 00:45:48 [4418] ga1-ext cib: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: Library error (2) > Sep 22 00:45:48 [4418] ga1-ext cib: error: cib_cs_destroy: > Corosync connection lost! Exiting. > Sep 22 00:45:48 [4418] ga1-ext cib: info: terminate_cib: > cib_cs_destroy: Exiting fast... > Sep 22 00:45:48 [4423] ga1-ext crmd: error: pcmk_cpg_dispatch: > Connection to the CPG API failed: Library error (2) > Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: > Destroying 0 events > Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: > Destroying 0 events > Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_client_destroy: > Destroying 0 events > Sep 22 00:45:48 [4418] ga1-ext cib: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Sep 22 00:45:48 [4423] ga1-ext crmd: error: crmd_cs_destroy: > connection terminated > Sep 22 00:45:48 [4412] ga1-ext pacemakerd: error: mcp_cpg_destroy: > Connection destroyed > Sep 22 00:45:48 [4418] ga1-ext cib: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: error: stonith_peer_cs_destroy: > Corosync connection terminated > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: stonith_shutdown: > Terminating with 1 clients > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: cib_connection_destroy: > Connection to the CIB closed. > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: crm_client_destroy: > Destroying 0 events > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: main: Done > Sep 22 00:45:48 [4419] ga1-ext stonith-ng: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > Sep 22 00:45:48 [4412] ga1-ext pacemakerd: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > Sep 22 00:45:48 [4421] ga1-ext attrd: crit: attrd_cs_destroy: Lost > connection to Corosync service! > Sep 22 00:45:48 [4421] ga1-ext attrd: notice: main: Exiting... > Sep 22 00:45:48 [4421] ga1-ext attrd: notice: main: Disconnecting > client 0x1987990, pid=4423... > Sep 22 00:45:48 [4421] ga1-ext attrd: error: > attrd_cib_connection_destroy: Connection to the CIB terminated... > Sep 22 00:45:48 [4423] ga1-ext crmd: info: qb_ipcs_us_withdraw: > withdrawing server sockets > Sep 22 00:45:48 [4423] ga1-ext crmd: info: > tengine_stonith_connection_destroy: Fencing daemon disconnected > Sep 22 00:45:48 [4423] ga1-ext crmd: notice: crmd_exit: > Forcing immediate exit: Link has been severed (67) > Sep 22 00:45:48 [4423] ga1-ext crmd: info: crm_xml_cleanup: > Cleaning up memory from libxml2 > Sep 22 00:45:48 [4420] ga1-ext lrmd: info: cancel_recurring_action: > Cancelling operation ClusterIP_monitor_30000 > Sep 22 00:45:48 [4420] ga1-ext lrmd: warning: qb_ipcs_event_sendv: > new_event_notification (4420-4423-6): Bad file descriptor (9) > Sep 22 00:45:48 [4420] ga1-ext lrmd: warning: send_client_notify: > Notification of client crmd/84c7e6b7-398c-40da-bec9-48b5e36dce2b failed > Sep 22 00:45:48 [4420] ga1-ext lrmd: info: crm_client_destroy: > Destroying 1 events > Sep 22 00:45:48 [4422] ga1-ext pengine: info: crm_client_destroy: > Destroying 0 events > > > no logs on ga2-ext, seems strange > no corosync configuration on cluster nodes > > [root@ga2-ext ~]# find /etc/corosync/ > /etc/corosync/ > /etc/corosync/uidgid.d > /etc/corosync/amf.conf.example > /etc/corosync/corosync.conf.old > /etc/corosync/corosync.conf.example > /etc/corosync/corosync.conf.example.udpu > /etc/corosync/service.d > > [root@ga1-ext ~]# find /etc/corosync/ > /etc/corosync/ > /etc/corosync/corosync.conf.example > /etc/corosync/service.d > /etc/corosync/corosync.conf.example.udpu > /etc/corosync/uidgid.d > /etc/corosync/corosync.conf.old > /etc/corosync/amf.conf.example > > same packages on both nodes > > corosync-1.4.1-15.el6_4.1.x86_64 > corosynclib-1.4.1-15.el6_4.1.x86_64 > drbd-bash-completion-8.3.15-1.el6.x86_64 > drbdlinks-1.23-1.el6.noarch > drbd-pacemaker-8.3.15-1.el6.x86_64 > drbd-udev-8.3.15-1.el6.x86_64 > drbd-utils-8.3.15-1.el6.x86_64 > pacemaker-1.1.10-1.el6.x86_64 > pacemaker-cli-1.1.10-1.el6.x86_64 > pacemaker-cluster-libs-1.1.10-1.el6.x86_64 > pacemaker-debuginfo-1.1.10-1.el6.x86_64 > pacemaker-libs-1.1.10-1.el6.x86_64 > > > On Sun, 22 Sep 2013 07:14:27 +0000, Alessandro Bono wrote: > >> Hi >> >> I have a problem with a cluster where pacemaker dies without logs or >> something >> Problem started when I switched to centos 6.4 and converted cluster from >> corosync to cman >> this happen typically when system is under high load >> tonight I received notification of drbd split brian and found on primary >> machine only these programs running >> >> 4420 ? Ss 1:29 /usr/libexec/pacemaker/lrmd >> 4422 ? Ss 0:42 /usr/libexec/pacemaker/pengine >> >> on secondary machine pacemaker is ok >> on logs only drbd disconnect and split brain notification >> I tried pacemaker 1.1.8 from centos and 1.1.9 and 1.1.10 from clusterlabs >> with same result >> >> howto debug this problem? >> /etc/sysconfig/pacemaker has lots configuration but not sure which one to use >> >> >> pacemaker configuration is: >> >> node ga1-ext \ >> attributes standby="off" >> node ga2-ext \ >> attributes standby="off" >> primitive ClusterIP ocf:heartbeat:IPaddr \ >> params ip="10.12.23.3" cidr_netmask="24" \ >> op monitor interval="30s" >> primitive SharedFS ocf:heartbeat:Filesystem \ >> params device="/dev/drbd/by-res/r0" directory="/shared" >> fstype="ext4" options="noatime,nobarrier" >> primitive dovecot lsb:dovecot >> primitive drbd0 ocf:linbit:drbd \ >> params drbd_resource="r0" \ >> op monitor interval="15s" >> primitive drbdlinks ocf:tummy:drbdlinks >> primitive mail ocf:heartbeat:MailTo \ >> params email="r...@company.com" subject="ga-ext cluster - " >> primitive mysql lsb:mysqld >> group service_group SharedFS drbdlinks ClusterIP mail mysql dovecot \ >> meta target-role="Started" >> ms ms_drbd0 drbd0 \ >> meta master-max="1" master-node-max="1" clone-max="2" >> clone-node-max="1" notify="true" >> colocation service_on_drbd inf: service_group ms_drbd0:Master >> order service_after_drbd inf: ms_drbd0:promote service_group:start >> property $id="cib-bootstrap-options" \ >> dc-version="1.1.10-1.el6-368c726" \ >> cluster-infrastructure="cman" \ >> expected-quorum-votes="2" \ >> stonith-enabled="false" \ >> no-quorum-policy="ignore" \ >> last-lrm-refresh="1379831462" \ >> maintenance-mode="false" >> rsc_defaults $id="rsc-options" \ >> resource-stickiness="100" >> >> >> cman configuration >> >> cat /etc/cluster/cluster.conf >> >> <cluster config_version="6" name="ga-ext_cluster"> >> <logging debug="off"/> >> <clusternodes> >> <clusternode name="ga1-ext" nodeid="1"> >> <fence> >> <method name="pcmk-redirect"> >> <device name="pcmk" port="ga1-ext"/> >> </method> >> </fence> >> </clusternode> >> <clusternode name="ga2-ext" nodeid="2"> >> <fence> >> <method name="pcmk-redirect"> >> <device name="pcmk" port="ga2-ext"/> >> </method> >> </fence> >> </clusternode> >> </clusternodes> >> <fencedevices> >> <fencedevice agent="fence_pcmk" name="pcmk"/> >> </fencedevices> >> </cluster> >> >> tell me you need other information >> >> thank you >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org