Re: [ClusterLabs] both nodes OFFLINE

Ken Gaillot Mon, 22 May 2017 15:16:41 -0700

On 05/13/2017 01:36 AM, 石井 俊直 wrote:
> Hi.
> 
> We have, sometimes, a problem in our two nodes cluster on CentOS7. Let node-2 
> and node-3
> be the names of the nodes. When the problem happens, both nodes are 
> recognized OFFLINE
> on node-3 and on node-2, only node-3 is recognized OFFLINE.
> 
> When that happens, the following log message is added repeatedly on node-2 
> and log file
> (/var/log/cluster/corosync.log) becomes hundreds of megabytes in short time. 
> Log message
> content on node-3 is different.
> 
> The erroneous state is temporally solved if OS of node-2 is restarted. On the 
> other hand,
> restarting OS of node-3 results in the same state.
> 
> I’ve searched content of ML and found a post (Mon Oct 1 01:27:39 CEST 2012) 
> about
> "Discarding update with feature set” problem. According to the message, our 
> problem
> may be solved by removing /var/lib/pacemaker/crm/cib.* on node-2.
> 
> What I want to know is whether removing the above files on just one of the 
> node is safe ?
> If there’s other method to solve the problem, I’d like to hear that.
> 
> Thanks.
> 
> —— from corosync.log ———————————————————————————————— 
> cib:    error: cib_perform_op:        Discarding update with feature set 
> '3.0.11' greater than our own '3.0.10'


This implies that the pacemaker versions are different on the two nodes.
Usually, when the pacemaker version changes, the feature set version
also changes, which means that it introduces new features that won't
work with older pacemaker versions.

Running a cluster with mixed pacemaker versions in such a case is
allowed, but only during a rolling upgrade. Once an older node leaves
the cluster for any reason, it will not be allowed to rejoin until it is
upgraded.

Removing the cib files won't help, since node-2 apparently does not
support node-3's pacemaker version.

If that's not the situation you are in, please give more details, as
this should not be possible otherwise.

> cib:    error: cib_process_request:   Completed cib_replace operation for 
> section 'all': Protocol not supported (rc=-93, origin=node-3/crmd/12708, 
> version=0.83.30)
> crmd:   error: finalize_sync_callback:        Sync from node-3 failed: 
> Protocol not supported
> crmd:    info: register_fsa_error_adv:        Resetting the current action 
> list
> crmd: warning: do_log:        Input I_ELECTION_DC received in state 
> S_FINALIZE_JOIN from finalize_sync_callback
> crmd:    info: do_state_transition:   State transition S_FINALIZE_JOIN -> 
> S_INTEGRATION | input=I_ELECTION_DC cause=C_FSA_INTERNAL 
> origin=finalize_sync_callback
> crmd:    info: crm_update_peer_join:  initialize_join: Node node-2[1] - 
> join-6329 phase 2 -> 0
> crmd:    info: crm_update_peer_join:  initialize_join: Node node-3[2] - 
> join-6329 phase 2 -> 0
> crmd:    info: update_dc:     Unset DC. Was node-2
> crmd:    info: join_make_offer:       join-6329: Sending offer to node-2
> crmd:    info: crm_update_peer_join:  join_make_offer: Node node-2[1] - 
> join-6329 phase 0 -> 1
> crmd:    info: join_make_offer:       join-6329: Sending offer to node-3
> crmd:    info: crm_update_peer_join:  join_make_offer: Node node-3[2] - 
> join-6329 phase 0 -> 1
> crmd:    info: do_dc_join_offer_all:  join-6329: Waiting on 2 outstanding 
> join acks
> crmd:    info: update_dc:     Set DC to node-2 (3.0.10)
> crmd:    info: crm_update_peer_join:  do_dc_join_filter_offer: Node node-2[1] 
> - join-6329 phase 1 -> 2
> crmd:    info: crm_update_peer_join:  do_dc_join_filter_offer: Node node-3[2] 
> - join-6329 phase 1 -> 2
> crmd:    info: do_state_transition:   State transition S_INTEGRATION -> 
> S_FINALIZE_JOIN | input=I_INTEGRATED cause=C_FSA_INTERNAL 
> origin=check_join_state
> crmd:    info: crmd_join_phase_log:   join-6329: node-2=integrated
> crmd:    info: crmd_join_phase_log:   join-6329: node-3=integrated
> crmd:  notice: do_dc_join_finalize:   Syncing the Cluster Information Base 
> from node-3 to rest of cluster | join-6329
> crmd:  notice: do_dc_join_finalize:   Requested version   <generation_tuple 
> crm_feature_set="3.0.11" validate-with="pacemaker-2.5" epoch="84" 
> num_updates="1" admin_epoch="0" cib-last-written="Thu May 11 08:05:45 2017" 
> update-origin="node-2" update-client="crm_resource" update-user="root" 
> have-quorum="1"/>
> cib:     info: cib_process_request:   Forwarding cib_sync operation for 
> section 'all' to node-3 (origin=local/crmd/12710)
> cib:     info: cib_process_replace:   Digest matched on replace from node-3: 
> 85a19c7927c54ccb15794f2720e07ce1
> cib:     info: cib_process_replace:   Replaced 0.83.30 with 0.84.1 from node-3
> cib:     info: __xml_diff_object:     Moved node_state@crmd (3 -> 2)

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] both nodes OFFLINE

Reply via email to