Hi, in a production environment with 2 nodes ( nodeA , nodeB ) we had an hardware failure so we restart the nodeB. After the restarted nodeB came up we restart corosync/pacemaker on it but for 2 days till now che corosync/pacemaker stuff is looping.
crm_mon NodeA: Stack: openais Current DC: nodeA - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 17 Resources configured. ============ Online: [ nodeA ] OFFLINE: [ nodeB ] crm_mon NodeB: Stack: openais Current DC: NONE 2 Nodes configured, 2 expected votes 17 Resources configured. ============ OFFLINE: [ nodeA nodeB ] This loop on nodeB reports: crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: nodeA) lost: vote from nodeA (Age) So investigating around i found these message on nodeA: cib: [28755]: ERROR: send_ais_message: Not connected to AIS now this message is repeating for every operation. Is it a corosync problem or a cib/pacemaker one ? Any suggestion on what is happened ? And why the start of a cluster node crasched the DC suff ? :( Bye Marco
signature.asc
Description: PGP signature
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org