Ok Sunday morning it's not good to collect information
logs are in /var/log/cluster directory on both nodes
these are for secondary node ga2-ext

Sep 22 00:45:39 corosync [TOTEM ] Process pause detected for 20442 ms, flushing 
membership messages.
Sep 22 00:45:45 corosync [CMAN  ] quorum lost, blocking activity
Sep 22 00:45:45 corosync [QUORUM] This node is within the non-primary component 
and will NOT provide any services.
Sep 22 00:45:45 corosync [QUORUM] Members[1]: 2
Sep 22 00:45:45 corosync [TOTEM ] A processor joined or left the membership and 
a new membership was formed.
Sep 22 00:45:45 corosync [CMAN  ] quorum regained, resuming activity
Sep 22 00:45:45 corosync [QUORUM] This node is within the primary component and 
will provide service.
Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
Sep 22 00:45:46 corosync [CPG   ] chosen downlist: sender r(0) ip(10.12.23.2) ; 
members(old:2 left:1)
Sep 22 00:45:46 corosync [MAIN  ] Completed service synchronization, ready to 
provide service.
Sep 22 00:45:46 [11432] ga2-ext        cib:     info: pcmk_cpg_membership:      
Left[1.0] cib.1 
Sep 22 00:45:46 [11433] ga2-ext stonith-ng:     info: pcmk_cpg_membership:      
Left[1.0] stonith-ng.1 
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: pcmk_cpg_membership:      
Member[1.0] cib.2 
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: pcmk_cpg_membership:      
Joined[2.0] cib.1 
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: pcmk_cpg_membership:      
Member[2.0] cib.1 
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online
Sep 22 00:45:47 [11432] ga2-ext        cib:     info: pcmk_cpg_membership:      
Member[2.1] cib.2 
Sep 22 00:45:47 [11433] ga2-ext stonith-ng:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline
Sep 22 00:45:48 [11432] ga2-ext        cib:     info: cib_process_diff:         
Diff 0.143.13 -> 0.143.14 from ga1-ext not applied to 0.143.9: current 
"num_updates" is less than required
Sep 22 00:45:48 [11432] ga2-ext        cib:     info: cib_server_process_diff:  
Requesting re-sync from peer
Sep 22 00:45:48 [11432] ga2-ext        cib:   notice: cib_server_process_diff:  
Not applying diff 0.143.14 -> 0.143.15 (sync in progress)
Sep 22 00:45:48 [11432] ga2-ext        cib:   notice: cib_server_process_diff:  
Not applying diff 0.143.15 -> 0.143.16 (sync in progress)
Sep 22 00:45:48 [11432] ga2-ext        cib:   notice: cib_server_process_diff:  
Not applying diff 0.143.16 -> 0.143.17 (sync in progress)
Sep 22 00:45:48 [11433] ga2-ext stonith-ng:     info: pcmk_cpg_membership:      
Member[1.0] stonith-ng.2 
Sep 22 00:45:48 [11433] ga2-ext stonith-ng:     info: pcmk_cpg_membership:      
Joined[2.0] stonith-ng.1 
Sep 22 00:45:48 [11433] ga2-ext stonith-ng:     info: pcmk_cpg_membership:      
Member[2.0] stonith-ng.1 
Sep 22 00:45:48 [11433] ga2-ext stonith-ng:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online
Sep 22 00:45:48 [11433] ga2-ext stonith-ng:     info: pcmk_cpg_membership:      
Member[2.1] stonith-ng.2 
Sep 22 00:45:48 [11432] ga2-ext        cib:     info: cib_process_replace:      
Digest matched on replace from ga1-ext: 00be365e16e96092747ee3d8acc74e7b
Sep 22 00:45:48 [11432] ga2-ext        cib:     info: cib_process_replace:      
Replaced 0.143.9 with 0.143.17 from ga1-ext
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_replace_notify:       
Replaced: 0.143.9 -> 0.143.17 from ga1-ext
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_replace operation for section 'all': OK (rc=0, 
origin=ga1-ext/ga2-ext/(null), version=0.143.17)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: crm_client_new:   
Connecting 0x23c9cd0 for uid=0 gid=0 pid=24772 
id=3e16081f-1b7b-4fb7-ab3a-e47dffb68615
Sep 22 00:45:48 [11437] ga2-ext       crmd:   notice: cman_event_callback:      
Membership 904: quorum lost
Sep 22 00:45:49 [11437] ga2-ext       crmd:   notice: crm_update_peer_state:    
cman_event_callback: Node ga1-ext[1] - state is now lost (was member)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_query operation for section nodes: OK (rc=0, 
origin=local/crm_attribute/2, version=0.143.17)
Sep 22 00:45:49 [11437] ga2-ext       crmd:     info: peer_update_callback:     
ga1-ext is now lost (was member)
Sep 22 00:45:49 [11437] ga2-ext       crmd:  warning: reap_dead_nodes:  Our DC 
node (ga1-ext) left the cluster
Sep 22 00:45:49 [11437] ga2-ext       crmd:   notice: cman_event_callback:      
Membership 904: quorum acquired
Sep 22 00:45:49 [11437] ga2-ext       crmd:  warning: reap_dead_nodes:  Our DC 
node (ga1-ext) left the cluster
Sep 22 00:45:49 [11437] ga2-ext       crmd:     info: cman_event_callback:      
Membership 904: quorum retained
Sep 22 00:45:49 [11437] ga2-ext       crmd:  warning: reap_dead_nodes:  Our DC 
node (ga1-ext) left the cluster
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_query operation for section 
//cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='shutdown']:
 No such device or address (rc=-6, origin=local/attrd/18, version=0.143.17)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_query operation for section 
//cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='terminate']:
 No such device or address (rc=-6, origin=local/attrd/19, version=0.143.17)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_query operation for section 
//cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='probe_complete']:
 OK (rc=0, origin=local/attrd/20, version=0.143.17)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Forwarding cib_modify operation for section status to master 
(origin=local/attrd/21)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Completed cib_query operation for section 
//cib/status//node_state[@id='ga2-ext']//transient_attributes//nvpair[@name='master-drbd0']:
 OK (rc=0, origin=local/attrd/22, version=0.143.17)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: cib_process_request:      
Forwarding cib_modify operation for section status to master 
(origin=local/attrd/23)
Sep 22 00:45:49 [11432] ga2-ext        cib:     info: crm_client_destroy:       
Destroying 0 events
Sep 22 00:45:49 [11437] ga2-ext       crmd:   notice: do_state_transition:      
State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL 
origin=reap_dead_nodes ]
Sep 22 00:45:49 [11437] ga2-ext       crmd:     info: update_dc:        Unset 
DC. Was ga1-ext
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: pcmk_cpg_membership:      
Left[1.0] crmd.1 
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now offline
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: peer_update_callback:     
Client ga1-ext/peer now has status [offline] (DC=<null>)
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: pcmk_cpg_membership:      
Member[1.0] crmd.2 
Sep 22 00:45:50 [11432] ga2-ext        cib:     info: write_cib_contents:       
Archived previous version as /var/lib/heartbeat/crm/cib-48.raw
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: pcmk_cpg_membership:      
Joined[2.0] crmd.1 
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: pcmk_cpg_membership:      
Member[2.0] crmd.1 
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: crm_update_peer_proc:     
pcmk_cpg_membership: Node ga1-ext[1] - corosync-cpg is now online
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: peer_update_callback:     
Client ga1-ext/peer now has status [online] (DC=<null>)
Sep 22 00:45:50 [11437] ga2-ext       crmd:     info: pcmk_cpg_membership:      
Member[2.1] crmd.2 
Sep 22 00:45:50 [11437] ga2-ext       crmd:  warning: do_log:   FSA: Input 
I_JOIN_OFFER from route_message() received in state S_ELECTION
Sep 22 00:45:50 [11432] ga2-ext        cib:     info: write_cib_contents:       
Wrote version 0.143.0 of the CIB to disk (digest: 
1d6de77608199fb31f552e22d5d0f708)
Sep 22 00:45:50 [11432] ga2-ext        cib:     info: retrieveCib:      Reading 
cluster configuration from: /var/lib/heartbeat/crm/cib.EqQlyt (digest: 
/var/lib/heartbeat/crm/cib.kNjefQ)
Sep 22 00:45:58 corosync [TOTEM ] A processor failed, forming new configuration.
Sep 22 00:46:00 corosync [CMAN  ] quorum lost, blocking activity
Sep 22 00:46:00 corosync [QUORUM] This node is within the non-primary component 
and will NOT provide any services.
Sep 22 00:46:00 corosync [QUORUM] Members[1]: 2
Sep 22 00:46:00 corosync [TOTEM ] A processor joined or left the membership and 
a new membership was formed.
Sep 22 00:46:00 corosync [CPG   ] chosen downlist: sender r(0) ip(10.12.23.2) ; 
members(old:2 left:1)
Sep 22 00:46:00 corosync [MAIN  ] Completed service synchronization, ready to 
provide service.

/var/log/cluster/fenced.log on primary

Sep 22 00:45:48 fenced cluster is down, exiting
Sep 22 00:45:48 fenced daemon cpg_dispatch error 2
Sep 22 00:45:48 fenced cpg_dispatch error 2

/var/log/cluster/dlm_controld.log on primary
Sep 22 00:45:48 dlm_controld cluster is down, exiting
Sep 22 00:45:48 dlm_controld daemon cpg_dispatch error 2


On Sun, 22 Sep 2013 08:21:18 +0000, Alessandro Bono wrote:

> Found logs in corosync(!?) log directory
> 
> these are for primary node ga1-ext
> 
> Sep 22 00:45:29 corosync [TOTEM ] A processor failed, forming new 
> configuration.
> Sep 22 00:45:31 corosync [CMAN  ] quorum lost, blocking activity
> Sep 22 00:45:31 corosync [QUORUM] This node is within the non-primary 
> component and will NOT provide any services.
> Sep 22 00:45:31 corosync [QUORUM] Members[1]: 1
> Sep 22 00:45:31 corosync [TOTEM ] A processor joined or left the membership 
> and a new membership was formed.
> Sep 22 00:45:31 corosync [CPG   ] chosen downlist: sender r(0) ip(10.12.23.1) 
> ; members(old:2 left:1)
> Sep 22 00:45:31 corosync [MAIN  ] Completed service synchronization, ready to 
> provide service.
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: pcmk_cpg_membership:     
>   Left[4.0] cib.2 
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: pcmk_cpg_membership:     
>   Member[4.0] cib.1 
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: pcmk_cpg_membership:     
>   Left[4.0] crmd.2 
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: peer_update_callback:    
>   Client ga2-ext/peer now has status [offline] (DC=true)
> Sep 22 00:45:31 [4423] ga1-ext       crmd:  warning: match_down_event:  No 
> match for shutdown action on ga2-ext
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: peer_update_callback:    
>   Stonith/shutdown of ga2-ext not matched
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: crm_update_peer_join:    
>   peer_update_callback: Node ga2-ext[2] - join-2 phase 4 -> 0
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: abort_transition_graph:  
>   peer_update_callback:214 - Triggered transition abort (complete=1) : Node 
> failure
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: pcmk_cpg_membership:     
>   Member[4.0] crmd.1 
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: cman_event_callback:     
>   Membership 900: quorum lost
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: crm_update_peer_state:   
>   cman_event_callback: Node ga2-ext[2] - state is now lost (was member)
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: peer_update_callback:    
>   ga2-ext is now lost (was member)
> Sep 22 00:45:31 [4423] ga1-ext       crmd:  warning: match_down_event:  No 
> match for shutdown action on ga2-ext
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: peer_update_callback:    
>   Stonith/shutdown of ga2-ext not matched
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: abort_transition_graph:  
>   peer_update_callback:214 - Triggered transition abort (complete=1) : Node 
> failure
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: do_state_transition:     
>   State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_dc_join_offer_one:    
>   An unknown node joined - (re-)offer to any unconfirmed nodes
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: join_make_offer:   
> Making join offers based on membership 900
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: join_make_offer:   
> Skipping ga1-ext: already known 4
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: abort_transition_graph:  
>   do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_state_transition:     
>   State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: crmd_join_phase_log:     
>   join-2: ga2-ext=none
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: crmd_join_phase_log:     
>   join-2: ga1-ext=confirmed
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_state_transition:     
>   State transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED 
> cause=C_FSA_INTERNAL origin=check_join_state ]
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: abort_transition_graph:  
>   do_te_invoke:151 - Triggered transition abort (complete=1) : Peer Cancelled
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/703, version=0.143.10)
> Sep 22 00:45:31 [4419] ga1-ext stonith-ng:     info: pcmk_cpg_membership:     
>   Left[3.0] stonith-ng.2 
> Sep 22 00:45:31 [4419] ga1-ext stonith-ng:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now offline
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/704, version=0.143.10)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section cib: OK (rc=0, 
> origin=local/crmd/705, version=0.143.11)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section nodes: OK (rc=0, 
> origin=local/crmd/706, version=0.143.11)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/707, version=0.143.12)
> Sep 22 00:45:31 [4421] ga1-ext      attrd:   notice: attrd_local_callback:    
>   Sending full refresh (origin=crmd)
> Sep 22 00:45:31 [4421] ga1-ext      attrd:   notice: attrd_trigger_update:    
>   Sending flush op to all hosts for: probe_complete (true)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section nodes: OK (rc=0, 
> origin=local/crmd/708, version=0.143.12)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/709, version=0.143.13)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section cib: OK (rc=0, 
> origin=local/crmd/710, version=0.143.13)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_query operation for section 'all': OK (rc=0, 
> origin=local/crmd/711, version=0.143.13)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:   notice: unpack_config:     On 
> loss of CCM Quorum: Ignore
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: determine_online_status: 
>   Node ga1-ext is online
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource dovecot active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:   notice: unpack_rsc_op:     
> Operation monitor found resource drbd0:0 active in master mode on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource ClusterIP active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource mail active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource mysql active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource drbdlinks active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: unpack_rsc_op:     
> Operation monitor found resource SharedFS active on ga1-ext
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: clone_print:        
> Master/Slave Set: ms_drbd0 [drbd0]
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: short_print:            
> Masters: [ ga1-ext ]
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: short_print:            
> Stopped: [ ga2-ext ]
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: group_print:        
> Resource Group: service_group
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> SharedFS   (ocf::heartbeat:Filesystem):    Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> drbdlinks  (ocf::tummy:drbdlinks): Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> ClusterIP  (ocf::heartbeat:IPaddr):        Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> mail       (ocf::heartbeat:MailTo):        Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> mysql      (lsb:mysqld):   Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_print:           
> dovecot    (lsb:dovecot):  Started ga1-ext 
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: native_color:      
> Resource drbd0:1 cannot run anywhere
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: master_color:      
> Promoting drbd0:0 (Master ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: master_color:      
> ms_drbd0: Promoted 1 instances of a possible 1 to master
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   drbd0:0 (Master ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   drbd0:1 (Stopped)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   SharedFS        (Started ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   drbdlinks       (Started ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   ClusterIP       (Started ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   mail    (Started ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   mysql   (Started ga1-ext)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:     info: LogActions:        Leave 
>   dovecot (Started ga1-ext)
> Sep 22 00:45:31 [4419] ga1-ext stonith-ng:     info: pcmk_cpg_membership:     
>   Member[3.0] stonith-ng.1 
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_query operation for section 
> //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='probe_complete']:
>  OK (rc=0, origin=local/attrd/51, version=0.143.13)
> Sep 22 00:45:31 [4421] ga1-ext      attrd:   notice: attrd_trigger_update:    
>   Sending flush op to all hosts for: master-drbd0 (10000)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/attrd/52, version=0.143.13)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_query operation for section 
> //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='master-drbd0']:
>  OK (rc=0, origin=local/attrd/53, version=0.143.13)
> Sep 22 00:45:31 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/attrd/54, version=0.143.13)
> Sep 22 00:45:31 [4422] ga1-ext    pengine:   notice: process_pe_message:      
>   Calculated Transition 621: /var/lib/pacemaker/pengine/pe-input-288.bz2
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_state_transition:     
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_te_invoke:      
> Processing graph 621 (ref=pe_calc-dc-1379803531-659) derived from 
> /var/lib/pacemaker/pengine/pe-input-288.bz2
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: run_graph:         
> Transition 621 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
> Source=/var/lib/pacemaker/pengine/pe-input-288.bz2): Complete
> Sep 22 00:45:31 [4423] ga1-ext       crmd:     info: do_log:    FSA: Input 
> I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
> Sep 22 00:45:31 [4423] ga1-ext       crmd:   notice: do_state_transition:     
>   State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Sep 22 00:45:45 corosync [TOTEM ] A processor joined or left the membership 
> and a new membership was formed.
> Sep 22 00:45:45 corosync [CMAN  ] quorum regained, resuming activity
> Sep 22 00:45:45 corosync [QUORUM] This node is within the primary component 
> and will provide service.
> Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
> Sep 22 00:45:45 corosync [QUORUM] Members[2]: 1 2
> Sep 22 00:45:45 [4423] ga1-ext       crmd:   notice: cman_event_callback:     
>   Membership 904: quorum acquired
> Sep 22 00:45:45 [4423] ga1-ext       crmd:   notice: crm_update_peer_state:   
>   cman_event_callback: Node ga2-ext[2] - state is now member (was lost)
> Sep 22 00:45:45 [4423] ga1-ext       crmd:     info: peer_update_callback:    
>   ga2-ext is now member (was lost)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/712, version=0.143.14)
> Sep 22 00:45:45 [4423] ga1-ext       crmd:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (1 remaining, last=37): Try again (6)
> Sep 22 00:45:45 [4423] ga1-ext       crmd:     info: cman_event_callback:     
>   Membership 904: quorum retained
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (1 remaining, last=64): Try again (6)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section cib: OK (rc=0, 
> origin=local/crmd/713, version=0.143.15)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section nodes: OK (rc=0, 
> origin=local/crmd/714, version=0.143.15)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/715, version=0.143.16)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section nodes: OK (rc=0, 
> origin=local/crmd/716, version=0.143.16)
> Sep 22 00:45:45 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/717, version=0.143.16)
> Sep 22 00:45:45 [4423] ga1-ext       crmd:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (2 remaining, last=37): Try again (6)
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (3 remaining, last=64): Try again (6)
> Sep 22 00:45:46 corosync [CPG   ] chosen downlist: sender r(0) ip(10.12.23.1) 
> ; members(old:1 left:0)
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (2 remaining, last=37): Try again (6)
> Sep 22 00:45:46 [4419] ga1-ext stonith-ng:     info: pcmk_cpg_membership:     
>   Joined[4.0] stonith-ng.2 
> Sep 22 00:45:46 [4419] ga1-ext stonith-ng:     info: pcmk_cpg_membership:     
>   Member[4.0] stonith-ng.1 
> Sep 22 00:45:46 [4419] ga1-ext stonith-ng:     info: pcmk_cpg_membership:     
>   Member[4.1] stonith-ng.2 
> Sep 22 00:45:46 [4419] ga1-ext stonith-ng:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: pcmk_cpg_membership:     
>   Joined[5.0] crmd.2 
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: pcmk_cpg_membership:     
>   Member[5.0] crmd.1 
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: pcmk_cpg_membership:     
>   Member[5.1] crmd.2 
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: peer_update_callback:    
>   Client ga2-ext/peer now has status [online] (DC=true)
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: pcmk_cpg_membership:     
>   Joined[5.0] cib.2 
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: pcmk_cpg_membership:     
>   Member[5.0] cib.1 
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: pcmk_cpg_membership:     
>   Member[5.1] cib.2 
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: crm_update_peer_proc:    
>   pcmk_cpg_membership: Node ga2-ext[2] - corosync-cpg is now online
> Sep 22 00:45:46 [4423] ga1-ext       crmd:   notice: do_state_transition:     
>   State transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN 
> cause=C_FSA_INTERNAL origin=peer_update_callback ]
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: do_dc_join_offer_one:    
>   An unknown node joined - (re-)offer to any unconfirmed nodes
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: join_make_offer:   
> Making join offers based on membership 904
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: join_make_offer:   
> join-2: Sending offer to ga2-ext
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: crm_update_peer_join:    
>   join_make_offer: Node ga2-ext[2] - join-2 phase 0 -> 1
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: join_make_offer:   
> Skipping ga1-ext: already known 4
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: abort_transition_graph:  
>   do_te_invoke:158 - Triggered transition abort (complete=1) : Peer Halt
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_modify operation for section status: OK (rc=0, 
> origin=local/crmd/718, version=0.143.17)
> Sep 22 00:45:46 [4419] ga1-ext stonith-ng:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (1 remaining, last=5): Try again (6)
> Sep 22 00:45:46 corosync [MAIN  ] Completed service synchronization, ready to 
> provide service.
> Sep 22 00:45:46 [4412] ga1-ext pacemakerd:     info: crm_cs_flush:      Sent 
> 0 CPG messages  (1 remaining, last=9): Try again (6)
> Sep 22 00:45:46 [4418] ga1-ext        cib:     info: crm_cs_flush:      Sent 
> 4 CPG messages  (0 remaining, last=68): OK (1)
> Sep 22 00:45:46 [4423] ga1-ext       crmd:     info: crm_cs_flush:      Sent 
> 3 CPG messages  (0 remaining, last=40): OK (1)
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: cib_process_request:     
>   Completed cib_sync_one operation for section 'all': OK (rc=0, 
> origin=ga2-ext/ga2-ext/(null), version=0.143.17)
> Sep 22 00:45:48 [4412] ga1-ext pacemakerd:    error: pcmk_cpg_dispatch:       
>   Connection to the CPG API failed: Library error (2)
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:    error: pcmk_cpg_dispatch:       
>   Connection to the CPG API failed: Library error (2)
> Sep 22 00:45:48 [4421] ga1-ext      attrd:    error: pcmk_cpg_dispatch:       
>   Connection to the CPG API failed: Library error (2)
> Sep 22 00:45:48 [4418] ga1-ext        cib:    error: pcmk_cpg_dispatch:       
>   Connection to the CPG API failed: Library error (2)
> Sep 22 00:45:48 [4418] ga1-ext        cib:    error: cib_cs_destroy:    
> Corosync connection lost!  Exiting.
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: terminate_cib:     
> cib_cs_destroy: Exiting fast...
> Sep 22 00:45:48 [4423] ga1-ext       crmd:    error: pcmk_cpg_dispatch:       
>   Connection to the CPG API failed: Library error (2)
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: qb_ipcs_us_withdraw:     
>   withdrawing server sockets
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: crm_client_destroy:      
>   Destroying 0 events
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: crm_client_destroy:      
>   Destroying 0 events
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: qb_ipcs_us_withdraw:     
>   withdrawing server sockets
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: crm_client_destroy:      
>   Destroying 0 events
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: qb_ipcs_us_withdraw:     
>   withdrawing server sockets
> Sep 22 00:45:48 [4423] ga1-ext       crmd:    error: crmd_cs_destroy:   
> connection terminated
> Sep 22 00:45:48 [4412] ga1-ext pacemakerd:    error: mcp_cpg_destroy:   
> Connection destroyed
> Sep 22 00:45:48 [4418] ga1-ext        cib:     info: crm_xml_cleanup:   
> Cleaning up memory from libxml2
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:    error: stonith_peer_cs_destroy: 
>   Corosync connection terminated
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: stonith_shutdown:  
> Terminating with  1 clients
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: cib_connection_destroy:  
>   Connection to the CIB closed.
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: crm_client_destroy:      
>   Destroying 0 events
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: qb_ipcs_us_withdraw:     
>   withdrawing server sockets
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: main:      Done
> Sep 22 00:45:48 [4419] ga1-ext stonith-ng:     info: crm_xml_cleanup:   
> Cleaning up memory from libxml2
> Sep 22 00:45:48 [4412] ga1-ext pacemakerd:     info: crm_xml_cleanup:   
> Cleaning up memory from libxml2
> Sep 22 00:45:48 [4421] ga1-ext      attrd:     crit: attrd_cs_destroy:  Lost 
> connection to Corosync service!
> Sep 22 00:45:48 [4421] ga1-ext      attrd:   notice: main:      Exiting...
> Sep 22 00:45:48 [4421] ga1-ext      attrd:   notice: main:      Disconnecting 
> client 0x1987990, pid=4423...
> Sep 22 00:45:48 [4421] ga1-ext      attrd:    error: 
> attrd_cib_connection_destroy:      Connection to the CIB terminated...
> Sep 22 00:45:48 [4423] ga1-ext       crmd:     info: qb_ipcs_us_withdraw:     
>   withdrawing server sockets
> Sep 22 00:45:48 [4423] ga1-ext       crmd:     info: 
> tengine_stonith_connection_destroy:        Fencing daemon disconnected
> Sep 22 00:45:48 [4423] ga1-ext       crmd:   notice: crmd_exit:         
> Forcing immediate exit: Link has been severed (67)
> Sep 22 00:45:48 [4423] ga1-ext       crmd:     info: crm_xml_cleanup:   
> Cleaning up memory from libxml2
> Sep 22 00:45:48 [4420] ga1-ext       lrmd:     info: cancel_recurring_action: 
>   Cancelling operation ClusterIP_monitor_30000
> Sep 22 00:45:48 [4420] ga1-ext       lrmd:  warning: qb_ipcs_event_sendv:     
>   new_event_notification (4420-4423-6): Bad file descriptor (9)
> Sep 22 00:45:48 [4420] ga1-ext       lrmd:  warning: send_client_notify:      
>   Notification of client crmd/84c7e6b7-398c-40da-bec9-48b5e36dce2b failed
> Sep 22 00:45:48 [4420] ga1-ext       lrmd:     info: crm_client_destroy:      
>   Destroying 1 events
> Sep 22 00:45:48 [4422] ga1-ext    pengine:     info: crm_client_destroy:      
>   Destroying 0 events
> 
> 
> no logs on ga2-ext, seems strange
> no corosync configuration on cluster nodes
> 
> [root@ga2-ext ~]# find /etc/corosync/
> /etc/corosync/
> /etc/corosync/uidgid.d
> /etc/corosync/amf.conf.example
> /etc/corosync/corosync.conf.old
> /etc/corosync/corosync.conf.example
> /etc/corosync/corosync.conf.example.udpu
> /etc/corosync/service.d
> 
> [root@ga1-ext ~]# find /etc/corosync/
> /etc/corosync/
> /etc/corosync/corosync.conf.example
> /etc/corosync/service.d
> /etc/corosync/corosync.conf.example.udpu
> /etc/corosync/uidgid.d
> /etc/corosync/corosync.conf.old
> /etc/corosync/amf.conf.example
> 
> same packages on both nodes 
> 
> corosync-1.4.1-15.el6_4.1.x86_64
> corosynclib-1.4.1-15.el6_4.1.x86_64
> drbd-bash-completion-8.3.15-1.el6.x86_64
> drbdlinks-1.23-1.el6.noarch
> drbd-pacemaker-8.3.15-1.el6.x86_64
> drbd-udev-8.3.15-1.el6.x86_64
> drbd-utils-8.3.15-1.el6.x86_64
> pacemaker-1.1.10-1.el6.x86_64
> pacemaker-cli-1.1.10-1.el6.x86_64
> pacemaker-cluster-libs-1.1.10-1.el6.x86_64
> pacemaker-debuginfo-1.1.10-1.el6.x86_64
> pacemaker-libs-1.1.10-1.el6.x86_64
> 
> 
> On Sun, 22 Sep 2013 07:14:27 +0000, Alessandro Bono wrote:
> 
>> Hi
>> 
>> I have a problem with a cluster where pacemaker dies without logs or 
>> something
>> Problem started when I switched to centos 6.4 and converted cluster from 
>> corosync to cman
>> this happen typically when system is under high load 
>> tonight I received notification of drbd split brian and found on primary 
>> machine only these programs running
>> 
>>  4420 ?        Ss     1:29 /usr/libexec/pacemaker/lrmd
>>  4422 ?        Ss     0:42 /usr/libexec/pacemaker/pengine
>> 
>> on secondary machine pacemaker is ok
>> on logs only drbd disconnect and split brain notification
>> I tried pacemaker 1.1.8 from centos and 1.1.9 and 1.1.10 from clusterlabs 
>> with same result
>> 
>> howto debug this problem? 
>> /etc/sysconfig/pacemaker has lots configuration but not sure which one to use
>> 
>> 
>> pacemaker configuration is:
>> 
>> node ga1-ext \
>>         attributes standby="off"
>> node ga2-ext \
>>         attributes standby="off"
>> primitive ClusterIP ocf:heartbeat:IPaddr \
>>         params ip="10.12.23.3" cidr_netmask="24" \
>>         op monitor interval="30s"
>> primitive SharedFS ocf:heartbeat:Filesystem \
>>         params device="/dev/drbd/by-res/r0" directory="/shared" 
>> fstype="ext4" options="noatime,nobarrier"
>> primitive dovecot lsb:dovecot
>> primitive drbd0 ocf:linbit:drbd \
>>         params drbd_resource="r0" \
>>         op monitor interval="15s"
>> primitive drbdlinks ocf:tummy:drbdlinks
>> primitive mail ocf:heartbeat:MailTo \
>>         params email="r...@company.com" subject="ga-ext cluster - "
>> primitive mysql lsb:mysqld
>> group service_group SharedFS drbdlinks ClusterIP mail mysql dovecot \
>>         meta target-role="Started"
>> ms ms_drbd0 drbd0 \
>>         meta master-max="1" master-node-max="1" clone-max="2" 
>> clone-node-max="1" notify="true"
>> colocation service_on_drbd inf: service_group ms_drbd0:Master
>> order service_after_drbd inf: ms_drbd0:promote service_group:start
>> property $id="cib-bootstrap-options" \
>>         dc-version="1.1.10-1.el6-368c726" \
>>         cluster-infrastructure="cman" \
>>         expected-quorum-votes="2" \
>>         stonith-enabled="false" \
>>         no-quorum-policy="ignore" \
>>         last-lrm-refresh="1379831462" \
>>         maintenance-mode="false"
>> rsc_defaults $id="rsc-options" \
>>         resource-stickiness="100"
>> 
>> 
>> cman configuration
>> 
>> cat /etc/cluster/cluster.conf 
>> 
>> <cluster config_version="6" name="ga-ext_cluster">
>>   <logging debug="off"/>
>>   <clusternodes>
>>     <clusternode name="ga1-ext" nodeid="1">
>>       <fence>
>>         <method name="pcmk-redirect">
>>           <device name="pcmk" port="ga1-ext"/>
>>         </method>
>>       </fence>
>>     </clusternode>
>>     <clusternode name="ga2-ext" nodeid="2">
>>       <fence>
>>         <method name="pcmk-redirect">
>>           <device name="pcmk" port="ga2-ext"/>
>>         </method>
>>       </fence>
>>     </clusternode>
>>   </clusternodes>
>>   <fencedevices>
>>     <fencedevice agent="fence_pcmk" name="pcmk"/>
>>   </fencedevices>
>> </cluster>
>> 
>> tell me you need other information
>> 
>> thank you
>> 
>> 
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> 
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
> 
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to