Yes, it avoids the crashes. Thanks! But I am still seeing spurious VM migrations/shutdowns when I stop/start a VM with a remote pacemaker (similar to my last update, only no core dumped while fencing, nor indeed does any fencing happen, even though I've now verified that fence_node works again.
On Wed, Jul 10, 2013 at 2:12 PM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- > > From: "Lindsay Todd" <rltodd....@gmail.com> > > To: "The Pacemaker cluster resource manager" < > pacemaker@oss.clusterlabs.org> > > Sent: Wednesday, July 10, 2013 12:11:00 PM > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > Hmm, I'll still submit the bug report, but it seems like crmd is dumping > core > > while attempting to fence a node. If I use fence_node to fence a real > > cluster node, that also causes crmd to dump core. But apart from that, I > > don't really see why pacemaker is trying to fence anything. > > This should solve the crashes you are seeing. > > > https://github.com/ClusterLabs/pacemaker/commit/97dd3b05db867c4674fa4780802bba54c63bd06d > > -- Vossel > > > > > > > On Wed, Jul 10, 2013 at 12:42 PM, Lindsay Todd < rltodd....@gmail.com > > > wrote: > > > > > > > > Thanks! But there is still a problem. > > > > I am now working from the master branch and building RPMs (well, I have > to > > also rebuild from the srpm to change the build number, since the RPMs > built > > directly are always 1.1.10-1). The patch is in the git log, and indeed > > things are better ... But I still see the spurious VMs shutting down. > What > > is much improved is that they do get restarted, and basically I end up in > > the state I want to be. Can almost live with this, and I was going to > start > > changing my cluster config to be asymmetric when I noticed the in the > midst > > of the spurious transitions, crmd is dumping core. > > > > So I'll append another crm_report to bug 5164, as well as a gdb > traceback. > > > > > > On Fri, Jul 5, 2013 at 5:06 PM, David Vossel < dvos...@redhat.com > > wrote: > > > > > > > > ----- Original Message ----- > > > From: "David Vossel" < dvos...@redhat.com > > > > To: "The Pacemaker cluster resource manager" < > > > pacemaker@oss.clusterlabs.org > > > > Sent: Wednesday, July 3, 2013 4:20:37 PM > > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > > > ----- Original Message ----- > > > > From: "Lindsay Todd" < rltodd....@gmail.com > > > > > To: "The Pacemaker cluster resource manager" > > > > < pacemaker@oss.clusterlabs.org > > > > > Sent: Wednesday, July 3, 2013 2:12:05 PM > > > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and > attributes > > > > > > > > Well, I'm not getting failures right now simply with attributes, but > I > > > > can > > > > induce a failure by stopping the vm-db02 (it puts db02 into an > unclean > > > > state, and attempts to migrate the unrelated vm-compute-test). I've > > > > collected the commands from my latest interactions, a crm_report, > and a > > > > gdb > > > > traceback from the core file that crmd dumped, into bug 5164. > > > > > > > > > Thanks, hopefully I can start investigating this Friday > > > > > > -- Vossel > > > > Yeah, this is a bad one. Adding the node attributes using crm_attribute > for > > the remote-node did some unexpected things to the crmd component. Somehow > > the remote-node was getting entered into the cluster node cache... which > > made it look like we had both a cluster-node and remote-node named the > same > > thing... not good. > > > > I think I got that part worked out. Try this patch. > > > > > https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32 > > > > Rather than trying to patch RCs, it might be worth trying out the master > > branch on github (which already has this patch). If you aren't already, > use > > rpms to make your life easier. Running 'make rpm' in the source directory > > will generate them for you. > > > > There was another bug fixed recently in pacemaker_remote involving the > > directory created for resource agents to store their temporary data > (stuff > > like pid files). I believe the fix was not introduced until 1.1.10rc6. > > > > -- Vossel > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org