[Pacemaker] pacemaker 1.1.6

emmanuel segura Tue, 03 Jun 2014 02:13:17 -0700

I was doing a nic firmware upgrade and i forgot to stop the cluster on node
where i was working, but something strange happened, both node are fenced
at the same time.


I'm using sbd as stonith device, with the following parameters.

watchdog time = 10 ; msgwait 20 ; stonith-timeout = 40(pacemaker)

May 31 14:41:48 node01 cluster-dlm: stop_kernel: clvmd stop_kernel cg 2
May 31 14:41:48 node01 corosync[76539]:  [CPG   ] chosen downlist: sender r(0) 
ip(191.255.5.201) ; members(old:2 left:1)
May 31 14:41:48 node01 cluster-dlm: do_sysfs: write "0" to 
"/sys/kernel/dlm/clvmd/control"
May 31 14:41:48 node01 crmd: [76549]: info: do_state_transition: State 
transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL 
origin=check_dead_member ]
May 31 14:41:48 node01 crmd: [76549]: info: update_dc: Unset DC node02
May 31 14:41:48 node01 corosync[76539]:  [MAIN  ] Completed service 
synchronization, ready to provide service.
May 31 14:41:48 node01 crmd: [76549]: info: do_state_transition: State 
transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC 
cause=C_FSA_INTERNAL origin=do_election_check ]
May 31 14:41:48 node01 crmd: [76549]: info: do_te_control: Registering TE UUID: 
3f4ffc02-37c8-471d-bb82-43b23b6c96c4
May 31 14:41:48 node01 crmd: [76549]: info: set_graph_functions: Setting custom 
graph functions
May 31 14:41:48 node01 crmd: [76549]: info: unpack_graph: Unpacked transition 
-1: 0 actions in 0 synapses
May 31 14:41:48 node01 crmd: [76549]: info: do_dc_takeover: Taking over DC 
status for this partition
May 31 14:41:48 node01 cib: [76545]: info: cib_process_readwrite: We are now in 
R/W mode
May 31 14:41:48 node01 cluster-dlm: fence_node_time: Node 1241907135/node02 has 
not been shot yet
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation 
complete: op cib_master for section 'all' (origin=local/crmd/179, 
version=0.1600.32): ok (rc=0)
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation 
complete: op cib_modify for section cib (origin=local/crmd/180, 
version=0.1600.33): ok (rc=0)
May 31 14:41:48 node01 cib: [76545]: info: cib_process_request: Operation 
complete: op cib_modify for section crm_config (origin=local/crmd/182, 
version=0.1600.34): ok (rc=0)
May 31 14:41:48 node01 crmd: [76549]: info: join_make_offer: Making join offers 
based on membership 1356
May 31 14:41:48 node01 crmd: [76549]: info: do_dc_join_offer_all: join-1: 
Waiting on 1 outstanding join acks
May 31 14:41:48 node01 crmd: [76549]: info: ais_dispatch_message: Membership 
1356: quorum still lost

May 31 14:41:48 node02 kernel: [905880.644815] qlcnic 0000:08:00.1: phy port: 1 
switch_mode: 0,
May 31 14:41:48 node02 kernel: [905880.644818]     max_tx_q: 1 max_rx_q: 16 
min_tx_bw: 0x0,
May 31 14:41:48 node02 kernel: [905880.644820]     max_tx_bw: 0x64 
max_mtu:0x2580, capabilities: 0xdeea0fae
May 31 14:41:48 node02 crmd: [16192]: info: crmd_ais_dispatch: Setting expected 
votes to 2
May 31 14:41:48 node02 sbd: [36423]: WARN: CIB: We do NOT have quorum!
May 31 14:41:48 node02 sbd: [36420]: WARN: Pacemaker health check: UNHEALTHY
May 31 14:41:48 node02 crmd: [16192]: WARN: match_down_event: No match for 
shutdown action on node01
May 31 14:41:48 node02 crmd: [16192]: info: te_update_diff: Stonith/shutdown of 
node01 not matched
May 31 14:41:48 node02 crmd: [16192]: info: abort_transition_graph: 
te_update_diff:234 - Triggered transition abort (complete=1, tag=node_state, 
id=s02srv002ch, magic=NA, cib=0.1600.33) : Node failure
May 31 14:41:48 node02 crmd: [16192]: info: do_state_transition: State 
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
May 31 14:41:48 node02 crmd: [16192]: info: do_state_transition: All 1 cluster 
nodes are eligible to run resources.
May 31 14:41:48 node02 crmd: [16192]: info: do_pe_invoke: Query 1676: 
Requesting the current CIB: S_POLICY_ENGINE
May 31 14:41:48 node02 cib: [16188]: info: cib_process_request: Operation 
complete: op cib_modify for section crm_config (origin=local/crmd/1675, 
version=0.1600.35): ok (rc=0)
May 31 14:41:48 node02 cluster-dlm: fence_node_time: Node 1225129919/node01 has 
not been shot yet
May 31 14:41:48 node02 cluster-dlm: check_fencing_done: clvmd check_fencing 
1225129919 wait add 1400654144 fail 1401540108 last 0
May 31 14:41:48 node02 kernel: [905880.676719] qlcnic 0000:08:00.1: Supports FW 
dump capability
May 31 14:41:48 node02 kernel: [905880.676728] qlcnic 0000:08:00.1: firmware 
v4.14.26
May 31 14:41:48 node02 crmd: [16192]: info: do_pe_invoke_callback: Invoking the 
PE: query=1676, ref=pe_calc-dc-1401540108-4630, seq=1356, quorate=0
May 31 14:41:48 node02 pengine: [16191]: notice: unpack_config: On loss of CCM 
Quorum: Ignore
May 31 14:41:48 node02 pengine: [16191]: WARN: pe_fence_node: Node node01 will 
be fenced because it is un-expectedly down
May 31 14:41:48 node02 pengine: [16191]: WARN: determine_online_status: Node 
s02srv002ch is unclean
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Action 
dlm:1_stop_0 on node01 is unrunnable (offline)
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Marking node 
node01 unclean
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Action 
clvm:1_stop_0 on node01 is unrunnable (offline)
May 31 14:41:48 node02 pengine: [16191]: WARN: custom_action: Marking node 
node01 unclean

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[Pacemaker] pacemaker 1.1.6

Reply via email to