Hi, here are the results of the corosync status. Can´t find a problem there:
pilotpound: [root@pilotpound ~]# corosync-cfgtool -s Printing ring status. Local node ID 425699520 RING ID 0 id = 192.168.95.25 status = ring 0 active with no faults RING ID 1 id = 192.168.20.245 status = ring 1 active with no faults [root@pilotpound ~]# corosync-objctl | grep member runtime.totem.pg.mrp.srp.members.425699520.ip=r(0) ip(192.168.95.25) r(1) ip(192.168.20.245) runtime.totem.pg.mrp.srp.members.425699520.join_count=1 runtime.totem.pg.mrp.srp.members.425699520.status=joined runtime.totem.pg.mrp.srp.members.442476736.ip=r(0) ip(192.168.95.26) r(1) ip(192.168.20.246) runtime.totem.pg.mrp.srp.members.442476736.join_count=1 runtime.totem.pg.mrp.srp.members.442476736.status=joined powerpound: [root@powerpound ~]# corosync-cfgtool -s Printing ring status. Local node ID 442476736 RING ID 0 id = 192.168.95.26 status = ring 0 active with no faults RING ID 1 id = 192.168.20.246 status = ring 1 active with no faults [root@powerpound ~]# corosync-objctl | grep member runtime.totem.pg.mrp.srp.members.442476736.ip=r(0) ip(192.168.95.26) r(1) ip(192.168.20.246) runtime.totem.pg.mrp.srp.members.442476736.join_count=1 runtime.totem.pg.mrp.srp.members.442476736.status=joined runtime.totem.pg.mrp.srp.members.425699520.ip=r(0) ip(192.168.95.25) r(1) ip(192.168.20.245) runtime.totem.pg.mrp.srp.members.425699520.join_count=5 runtime.totem.pg.mrp.srp.members.425699520.status=joined So I think I´ve got to swollow the bitter pill and restart the whole cluster. I will report about the result. Kind regards fatcharly -------- Original-Nachricht -------- > Datum: Fri, 20 Jul 2012 12:21:47 -0400 (EDT) > Von: Jake Smith <jsm...@argotec.com> > An: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3 > > ----- Original Message ----- > > From: fatcha...@gmx.de > > To: "Jake Smith" <jsm...@argotec.com>, "The Pacemaker cluster resource > manager" <pacemaker@oss.clusterlabs.org> > > Sent: Friday, July 20, 2012 11:50:52 AM > > Subject: Re: [Pacemaker] problem with pacemaker/corosync on CentOS 6.3 > > > > Hi Jake, > > > > I erased the files as mentioned und started the services. This is > > what I get on pilotpound after crm_mon : > > > > ============ > > Last updated: Fri Jul 20 17:45:58 2012 > > Last change: > > Current DC: NONE > > 0 Nodes configured, unknown expected votes > > 0 Resources configured. > > ============ > > > > > > Looks like the system didn´t joined the cluster. > > > > Any suggestions are welcome > > Oh maybe worth checking corosync membership and see what it says now: > http://www.hastexo.com/resources/hints-and-kinks/checking-corosync-cluster-membership > > > > > Kind regards > > > > fatharly > > > > ------- Original-Nachricht -------- > > > Datum: Fri, 20 Jul 2012 10:49:15 -0400 (EDT) > > > Von: Jake Smith <jsm...@argotec.com> > > > An: The Pacemaker cluster resource manager > > > <pacemaker@oss.clusterlabs.org> > > > Betreff: Re: [Pacemaker] problem with pacemaker/corosync on CentOS > > > 6.3 > > > > > > > > ----- Original Message ----- > > > > From: fatcha...@gmx.de > > > > To: pacemaker@oss.clusterlabs.org > > > > Sent: Friday, July 20, 2012 6:08:45 AM > > > > Subject: [Pacemaker] problem with pacemaker/corosync on CentOS > > > > 6.3 > > > > > > > > Hi, > > > > > > > > I´m using a pacemaker+corosync bundle to run a pound based > > > > loadbalancer. After an update on CentOS 6.3 there is some > > > > mismatch > > > > of the node status. Via crm_mon on one node eveything looks fine > > > > while on the other node everything is offline. Everything was > > > > fine > > > > on CentOS 6.2. > > > > > > > > Node powerpound: > > > > > > > > ============ > > > > Last updated: Fri Jul 20 12:04:29 2012 > > > > Last change: Thu Jul 19 17:58:31 2012 via crm_attribute on > > > > pilotpound > > > > Stack: openais > > > > Current DC: powerpound - partition with quorum > > > > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 > > > > 2 Nodes configured, 2 expected votes > > > > 7 Resources configured. > > > > ============ > > > > > > > > Online: [ powerpound pilotpound ] > > > > > > > > HA_IP_1 (ocf::heartbeat:IPaddr2): Started powerpound > > > > HA_IP_2 (ocf::heartbeat:IPaddr2): Started powerpound > > > > HA_IP_3 (ocf::heartbeat:IPaddr2): Started powerpound > > > > HA_IP_4 (ocf::heartbeat:IPaddr2): Started powerpound > > > > HA_IP_5 (ocf::heartbeat:IPaddr2): Started powerpound > > > > Clone Set: pingclone [ping-gateway] > > > > Started: [ pilotpound powerpound ] > > > > > > > > > > > > Node pilotpound: > > > > > > > > ============ > > > > Last updated: Fri Jul 20 12:04:32 2012 > > > > Last change: Thu Jul 19 17:58:17 2012 via crm_attribute on > > > > pilotpound > > > > Stack: openais > > > > Current DC: NONE > > > > 2 Nodes configured, 2 expected votes > > > > 7 Resources configured. > > > > ============ > > > > > > > > OFFLINE: [ powerpound pilotpound ] > > > > > > > > > > > > > > > > > > > > > > > > from /var/log/messages on pilotpound: > > > > > > > > Jul 20 12:06:12 pilotpound cib[24755]: warning: > > > > cib_peer_callback: > > > > Discarding cib_apply_diff message (35909) from powerpound: not in > > > > our mem bership > > > > Jul 20 12:06:12 pilotpound cib[24755]: warning: > > > > cib_peer_callback: > > > > Discarding cib_apply_diff message (35910) from powerpound: not in > > > > our mem bership > > > > > > > > > > > > > > > > how could this happened and what can I do to solve this problem ? > > > > > > Pretty sure it had nothing to do with upgrade - I had this the > > > other day > > > on Ubuntu 12.04 after a reboot of both nodes. I believe a couple > > > experts > > > called it a "transient" bug. See: > > > https://bugzilla.redhat.com/show_bug.cgi?id=820821 > > > https://bugzilla.redhat.com/show_bug.cgi?id=5040 > > > > > > > > > > > Any suggestions are welcome > > > > > > I fixed by stopping/killing pacemaker/corosync on offending node > > > (pilotpound). Then cleared these files out on same node: > > > rm /var/lib/heartbeat/crm/cib* > > > rm /var/lib/pengine/* > > > > > > Then restart corosync/pacemaker and the node rejoined fine. > > > > > > HTH > > > > > > Jake > > > > > > _______________________________________________ > > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > > > Project Home: http://www.clusterlabs.org > > > Getting started: > > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > Bugs: http://bugs.clusterlabs.org > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org