On Fri, May 25, 2012 at 3:40 PM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- >> From: "Larry Brigman" <larry.brig...@gmail.com> >> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> >> Sent: Friday, May 25, 2012 5:27:21 PM >> Subject: Re: [Pacemaker] Removed nodes showing back in status >> >> On Fri, May 25, 2012 at 9:59 AM, Larry Brigman >> <larry.brig...@gmail.com> wrote: >> > On Wed, May 16, 2012 at 1:53 PM, David Vossel <dvos...@redhat.com> >> > wrote: >> >> ----- Original Message ----- >> >>> From: "Larry Brigman" <larry.brig...@gmail.com> >> >>> To: "The Pacemaker cluster resource manager" >> >>> <pacemaker@oss.clusterlabs.org> >> >>> Sent: Monday, May 14, 2012 4:59:55 PM >> >>> Subject: Re: [Pacemaker] Removed nodes showing back in status >> >>> >> >>> On Mon, May 14, 2012 at 2:13 PM, David Vossel >> >>> <dvos...@redhat.com> >> >>> wrote: >> >>> > ----- Original Message ----- >> >>> >> From: "Larry Brigman" <larry.brig...@gmail.com> >> >>> >> To: "The Pacemaker cluster resource manager" >> >>> >> <pacemaker@oss.clusterlabs.org> >> >>> >> Sent: Monday, May 14, 2012 1:30:22 PM >> >>> >> Subject: Re: [Pacemaker] Removed nodes showing back in status >> >>> >> >> >>> >> On Mon, May 14, 2012 at 9:54 AM, Larry Brigman >> >>> >> <larry.brig...@gmail.com> wrote: >> >>> >> > I have a 5 node cluster (but it could be any number of >> >>> >> > nodes, 3 >> >>> >> > or >> >>> >> > larger). >> >>> >> > I am testing some scripts for node removal. >> >>> >> > I remove a node from the cluster and everything looks >> >>> >> > correct >> >>> >> > from >> >>> >> > crm >> >>> >> > status standpoint. >> >>> >> > When I remove a second node, the first node that was removed >> >>> >> > now >> >>> >> > shows back >> >>> >> > in the crm status as off-line. I'm following the guidelines >> >>> >> > provided >> >>> >> > in Pacemaker Explained docs. >> >>> >> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html >> >>> >> > >> >>> >> > I believe this is a bug but want to put it out to the list >> >>> >> > to be >> >>> >> > sure. >> >>> >> > Versions. >> >>> >> > RHEL5.7 x86_64 >> >>> >> > corosync-1.4.2 >> >>> >> > openais-1.1.3 >> >>> >> > pacemaker-1.1.5 >> >>> >> > >> >>> >> > Status after first node removed >> >>> >> > [root@portland-3 ~]# crm status >> >>> >> > ============ >> >>> >> > Last updated: Mon May 14 08:42:04 2012 >> >>> >> > Stack: openais >> >>> >> > Current DC: portland-1 - partition with quorum >> >>> >> > Version: >> >>> >> > 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >> >>> >> > 4 Nodes configured, 4 expected votes >> >>> >> > 0 Resources configured. >> >>> >> > ============ >> >>> >> > >> >>> >> > Online: [ portland-1 portland-2 portland-3 portland-4 ] >> >>> >> > >> >>> >> > Status after second node removed. >> >>> >> > [root@portland-3 ~]# crm status >> >>> >> > ============ >> >>> >> > Last updated: Mon May 14 08:42:45 2012 >> >>> >> > Stack: openais >> >>> >> > Current DC: portland-1 - partition with quorum >> >>> >> > Version: >> >>> >> > 1.1.5-1.3.sme-01e86afaaa6d4a8c4836f68df80ababd6ca3902f >> >>> >> > 4 Nodes configured, 3 expected votes >> >>> >> > 0 Resources configured. >> >>> >> > ============ >> >>> >> > >> >>> >> > Online: [ portland-1 portland-3 portland-4 ] >> >>> >> > OFFLINE: [ portland-5 ] >> >>> >> > >> >>> >> > Both nodes were removed from the cluster from node 1. >> >>> >> >> >>> >> When I added a node back into the cluster the second node >> >>> >> that was removed now shows as offline. >> >>> > >> >>> > The only time I've seen this sort of behavior is when I don't >> >>> > completely shutdown corosync and pacemaker on the node I'm >> >>> > removing before I delete it's configuration from the cib. Are >> >>> > you >> >>> > sure corosync and pacemaker are gone before you delete the node >> >>> > from the cluster config? >> >>> >> >>> Well, I run service pacemaker stop and service corosync stop >> >>> prior to >> >>> doing >> >>> the remove. Since I am doing it all in a script it's possible >> >>> that >> >>> there >> >>> is a race condition that I have just expose or the services are >> >>> not >> >>> fully down >> >>> when the service script exits. >> >> >> >> Yep, If you are waiting for the service scripts to return I would >> >> expect it to be safe to remove the nodes at that point. >> >> >> >>> BTW, I'm running pacemaker as it's own process instead of being a >> >>> child of >> >>> corosync (if that makes a difference). >> >>> >> >> >> >> This shouldn't matter. >> >> >> >> An hb_report of this will help us distinguish if this is a bug or >> >> not. >> > Bug opened with the hb and crm reports. >> > https://developerbugs.linuxfoundation.org/show_bug.cgi?id=2648 >> > >> >> I just tried something that seem to point that things are still >> around somewhere >> in the cib. I stopped and pacemaker. This causes both removed nodes >> to show back in pacemaker as offline. Looks like the cluster's from >> scratch >> documentation to remove a node doesn't work correctly. > > Interesting, thanks for generating the logs. I'll look through them when I > get a chance. > >> BTW which is the best place to file the bugs? Clusterlabs or >> Linuxfoundations? > > We are tracking pacemaker issues here, http://bugs.clusterlabs.org/. Please > re-locate the issue.
Done: http://bugs.clusterlabs.org/show_bug.cgi?id=5068 _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org