On Wed, May 16, 2012 at 1:53 PM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- >> From: "Larry Brigman" <larry.brig...@gmail.com> >> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> >> Sent: Monday, May 14, 2012 4:59:55 PM >> Subject: Re: [Pacemaker] Removed nodes showing back in status >> >> On Mon, May 14, 2012 at 2:13 PM, David Vossel <dvos...@redhat.com> >> wrote: >> > ----- Original Message ----- >> >> From: "Larry Brigman" <larry.brig...@gmail.com> >> >> To: "The Pacemaker cluster resource manager" >> >> <pacemaker@oss.clusterlabs.org> >> >> Sent: Monday, May 14, 2012 1:30:22 PM >> >> Subject: Re: [Pacemaker] Removed nodes showing back in status >> >> >> >> On Mon, May 14, 2012 at 9:54 AM, Larry Brigman >> >> <larry.brig...@gmail.com> wrote: >> >> > I have a 5 node cluster (but it could be any number of nodes, 3 >> >> > or >> >> > larger). >> >> > I am testing some scripts for node removal. >> >> > I remove a node from the cluster and everything looks correct >> >> > from >> >> > crm >> >> > status standpoint. >> >> > When I remove a second node, the first node that was removed now >> >> > shows back >> >> > in the crm status as off-line. I'm following the guidelines >> >> > provided >> >> > in Pacemaker Explained docs. >> >> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html >> >> > >> >> > I believe this is a bug but want to put it out to the list to be >> >> > sure. >> >> > Versions. >> >> > RHEL5.7 x86_64 >> >> > corosync-1.4.2 >> >> > openais-1.1.3 >> >> > pacemaker-1.1.5 >> >> > >> >> > Status after first node removed >> >> > [root@portland-3 ~]# crm status >> >> > ============ >> >> > Last updated: Mon May 14 08:42:04 2012 >> >> > Stack: openais >> >> > Current DC: portland-1 - partition with quorum >> >> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >> >> > 4 Nodes configured, 4 expected votes >> >> > 0 Resources configured. >> >> > ============ >> >> > >> >> > Online: [ portland-1 portland-2 portland-3 portland-4 ] >> >> > >> >> > Status after second node removed. >> >> > [root@portland-3 ~]# crm status >> >> > ============ >> >> > Last updated: Mon May 14 08:42:45 2012 >> >> > Stack: openais >> >> > Current DC: portland-1 - partition with quorum >> >> > Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >> >> > 4 Nodes configured, 3 expected votes >> >> > 0 Resources configured. >> >> > ============ >> >> > >> >> > Online: [ portland-1 portland-3 portland-4 ] >> >> > OFFLINE: [ portland-5 ] >> >> > >> >> > Both nodes were removed from the cluster from node 1. >> >> >> >> When I added a node back into the cluster the second node >> >> that was removed now shows as offline. >> > >> > The only time I've seen this sort of behavior is when I don't >> > completely shutdown corosync and pacemaker on the node I'm >> > removing before I delete it's configuration from the cib. Are you >> > sure corosync and pacemaker are gone before you delete the node >> > from the cluster config? >> >> Well, I run service pacemaker stop and service corosync stop prior to >> doing >> the remove. Since I am doing it all in a script it's possible that >> there >> is a race condition that I have just expose or the services are not >> fully down >> when the service script exits. > > Yep, If you are waiting for the service scripts to return I would expect it > to be safe to remove the nodes at that point. > >> BTW, I'm running pacemaker as it's own process instead of being a >> child of >> corosync (if that makes a difference). >> > > This shouldn't matter. > > An hb_report of this will help us distinguish if this is a bug or not. Bug opened with the hb and crm reports. https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2648
> > -- Vossel > >> [root@portland-3 ~]# cat /etc/corosync/service.d/pcmk >> service { >> # Load the Pacemaker Cluster Resource Manager >> ver: 1 >> name: pacemaker >> # use_mgmtd: yes >> # use_logd: yes >> } >> >> It looks from corosync a removed node and a down node have the same >> object >> state. >> 4.0.0.2 is removed 4.0.0.5 is stopped. >> >> [root@portland-3 ~]# corosync-objctl -a | grep member >> runtime.totem.pg.mrp.srp.members.16777220.ip=r(0) ip(4.0.0.1) >> runtime.totem.pg.mrp.srp.members.16777220.join_count=1 >> runtime.totem.pg.mrp.srp.members.16777220.status=joined >> runtime.totem.pg.mrp.srp.members.50331652.ip=r(0) ip(4.0.0.3) >> runtime.totem.pg.mrp.srp.members.50331652.join_count=1 >> runtime.totem.pg.mrp.srp.members.50331652.status=joined >> runtime.totem.pg.mrp.srp.members.67108868.ip=r(0) ip(4.0.0.4) >> runtime.totem.pg.mrp.srp.members.67108868.join_count=3 >> runtime.totem.pg.mrp.srp.members.67108868.status=joined >> runtime.totem.pg.mrp.srp.members.83886084.ip=r(0) ip(4.0.0.5) >> runtime.totem.pg.mrp.srp.members.83886084.join_count=4 >> runtime.totem.pg.mrp.srp.members.83886084.status=joined >> runtime.totem.pg.mrp.srp.members.33554436.ip=r(0) ip(4.0.0.2) >> runtime.totem.pg.mrp.srp.members.33554436.join_count=1 >> runtime.totem.pg.mrp.srp.members.33554436.status=left >> >> > >> > -- Vossel >> > >> >> [root@portland-3 ~]# crm status >> >> ============ >> >> Last updated: Mon May 14 11:27:55 2012 >> >> Stack: openais >> >> Current DC: portland-1 - partition with quorum >> >> Version: 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >> >> 5 Nodes configured, 4 expected votes >> >> 0 Resources configured. >> >> ============ >> >> >> >> Online: [ portland-1 portland-3 portland-4 portland-5 ] >> >> OFFLINE: [ portland-2 ] >> >> >> >> _______________________________________________ >> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> >> >> Project Home: http://www.clusterlabs.org >> >> Getting started: >> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> >> Bugs: http://bugs.clusterlabs.org >> >> >> > >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org