On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote:

> On Tue, 8 Apr 2014 10:49:16 +1000
> Andrew Beekhof <and...@beekhof.net> wrote:
> 
>> 
>> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote:
>> 
>>> Hi,
>>> in a production environment with 2 nodes ( nodeA , nodeB ) we had an
>>> hardware failure so we restart the nodeB.
>>> After the restarted nodeB came up we restart corosync/pacemaker on
>>> it but for 2 days till now che corosync/pacemaker stuff is looping.
>>> 
>>> crm_mon NodeA:
>>> 
>>> Stack: openais
>>> Current DC: nodeA - partition with quorum
>>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
>>> 2 Nodes configured, 2 expected votes
>>> 17 Resources configured.
>>> ============
>>> 
>>> Online: [ nodeA ]
>>> OFFLINE: [ nodeB ]
>>> 
>>> 
>>> crm_mon NodeB:
>>> 
>>> Stack: openais
>>> Current DC: NONE
>>> 2 Nodes configured, 2 expected votes
>>> 17 Resources configured.
>>> ============
>>> 
>>> OFFLINE: [ nodeA nodeB ]
>>> 
>>> This loop on nodeB reports:
>>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner:
>>> nodeA) lost: vote from nodeA (Age)
>>> 
>>> So investigating around i found these message on nodeA:
>>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS
>>> 
>>> now this message is repeating for every operation.
>>> Is it a corosync problem or a cib/pacemaker one ?
>>> Any suggestion on what is happened ?
>> 
>> For some reason the cib can't connect to corosync anymore.
>> No software got upgraded recently?
>> 
>> Are there any logs from corosync?
>> Which distro is this?
>> 
>>> And why the start of a cluster node crasched the DC suff ? :(
>>> 
>>> 
>>> Bye Marco
>>> 
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>> 
>>> Project Home: http://www.clusterlabs.org
>>> Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>> http://bugs.clusterlabs.org
>> 
> 
> Hi,
> the distro in an opensuse 11.1 and there is no updates also because the
> distro is out of maintenance.

A good reason to be using SLES (or RHEL/CentOS).

> We are planning and upgrade but the interesting thing is to figure out
> the reasons of the problem.
> The log in attachment, thanks for the support

There's nothing obvious in the logs.  Just that as far as pacemaker could tell, 
corosync suddenly went away.
Was the corosync process still running?

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to