On Wed, Dec 2, 2009 at 3:22 PM, Frank DiMeo <frank.di...@bigbandnet.com> wrote: > Ask and ye shall receive. :) > > I'm enclosing my openais init script, which I'm running on my two node > cluster made up of identical Ubuntu (9.04) machines called ubuntu_2 and > ubuntu_1.
If the node takes more than 30s to shut down, then it kills openais. In that case, its no surprise that the lrmd and pengine are still around - because the cluster didn't have time to shut down cleanly. > Running pacemaker 1.06 from the tip as of a month ago or so. > > I'm also enclosing two sets of files which may help you see whats happening. > > The "working" set: > > 4rsc_worlds_coloc_ordered.xml - this is my initial configuration file. When > I use this to initial my cluster, the 4 resources all start up in order, on > the right node, and move together when I put nodes in and out of standby. > > goodconfig_debug.txt - the log file (from ubuntu_1) showing what happens when > the resources are running on node "ubuntu_2" and I put that node into > standby. All resources are moved to "ubuntu_1". If I stop openais, > everything shuts down quickly and clean, and no processes (like lrmd, > pengine, etc) are left running. > > The "not working" set: Can you attach /var/lib/pengine/pe-input-12434.bz2 from ubuntu_1 please? > > 4rsc_worlds_coloc_ordered_alt1.xml - this is identical to the xml file in the > working set, except I use the compact syntax for ordering. > > badconfig_debug.txt - the log file (from ubuntu_1) showing what happens when > the resources are running on node "ubuntu_2" and I put that node into > standby. The pe wants to move them to ubuntu_1, but the pe only seems to > generate "pseudo actions" and never really moves anything. The resources > continue to run on node ubuntu_2 even when the node is in standby! Further, > if I try to shut down openais on ubuntu_2 at this point (using the > /etc/init.d/openais script enclosed), after a long time, corosync stops, but > lrmd and pengine keep running, and become children of the init process. > Again, the resources keep running even at this point, which is because they > are never commanded to stop. > > I can send you my RA's and the resources themselves (which are just bash > scripts) if you'd like. > > I'll apply the patch you pointed to and let you know what happens. > > Thanks very much, > -Frank > > >> -----Original Message----- >> From: Andrew Beekhof [mailto:and...@beekhof.net] >> Sent: Wednesday, December 02, 2009 6:00 AM >> To: pacemaker@oss.clusterlabs.org >> Subject: Re: [Pacemaker] bug in ordering syntax? >> >> On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo >> <frank.di...@bigbandnet.com> wrote: >> > I'm experimenting with startup sequence and co-location control, and >> think I >> > may have stumbled across a bug. >> > >> > >> > >> > I have two xml files that I use in my testing as my initial >> configuration of >> > a two node cluster. I start each node with no configuration, and >> then use >> > cibadmin to "source in" the xml file. Each file defines two >> resources as >> > well as a startup order and collocation definition. The only >> difference >> > between the two files is the syntax I use to specify the startup >> order. >> > >> > >> > >> > When I use the syntax: >> > >> > >> > >> > <rsc_order id="order-1" first="world1" then="world2" score="INFINITY" >> /> >> > >> > >> > >> > Everything works fine. I can put either of the two nodes into >> standby while >> > resources are running there, and the resources move to the other node >> as >> > expected. >> > >> > >> > >> > However, when I use the syntax: >> > >> > >> > >> > - <<rsc_order id="order-1"> >> >> You're missing a score. Without one it defaults to 0 (which means >> optional). >> However, IIRC, the 1.0.6 schema won't allow you to set a score there >> so you'll need to apply the following patch: >> http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c >> >> > >> > - < <resource_set id="order-1-set-1" sequential="true"> >> > >> > < <resource_ref id="world1" /> >> > >> > < <resource_ref id="world2" /> >> > >> > </resource_set> >> > >> > </rsc_order> >> > >> > >> > >> > >> > >> > Several bad things happen. First, the resources don't move off the >> node >> > that is put into standby, even though the alternate node is running >> and able >> > to run the resources. >> >> Did you remove the other ordering constraint first? >> >> > Second, attempting to shut down openais on the node >> > running the resources after attempting a forced move (by putting the >> node >> > into standby) leaves both the lrmd and pengine processes running (but >> > children of process 1 (init), and the resources continue to run on >> the that >> > node even after openais is stopped. >> >> I suspect you've a faulty init script there. See other email. >> >> > I turned debug on in crmd and in the logs and recorded what happens >> when I >> > force standby, and I notice that using the first syntax causes >> > te_rsc_command to be executed to send a shut down message to the node >> where >> > the resources are running (which seems to work), while using the >> second >> > syntax causes te_pseudo_action to be called in approximately the same >> place >> > in the log, but no shutdown of resources happens (I can't really tell >> what >> > this is supposed to be doing). >> >> Neither can I - you didnt attach the logs :-) >> >> _______________________________________________ >> Pacemaker mailing list >> Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker