Re: [Pacemaker] bug in ordering syntax?
Could you attach /var/lib/pengine/pe-input-12434.bz2 instead please? Thats the one I'd been debugging. On Thu, Dec 3, 2009 at 2:33 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: Here's are file after bunzip2'ing. 12663 is for the 2 resource case, 12644 is the 4 resource case. Thanks, -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, December 03, 2009 3:45 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. Yes, this looks to be the problem. But I need the bz2 file to make any progress. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
Sorry, but I don't have that one anymore. -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Tuesday, December 08, 2009 3:42 AM To: Frank DiMeo Cc: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? Could you attach /var/lib/pengine/pe-input-12434.bz2 instead please? Thats the one I'd been debugging. On Thu, Dec 3, 2009 at 2:33 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: Here's are file after bunzip2'ing. 12663 is for the 2 resource case, 12644 is the 4 resource case. Thanks, -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, December 03, 2009 3:45 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. Yes, this looks to be the problem. But I need the bz2 file to make any progress. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
On Wed, Dec 2, 2009 at 3:22 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: Ask and ye shall receive. :) I'm enclosing my openais init script, which I'm running on my two node cluster made up of identical Ubuntu (9.04) machines called ubuntu_2 and ubuntu_1. If the node takes more than 30s to shut down, then it kills openais. In that case, its no surprise that the lrmd and pengine are still around - because the cluster didn't have time to shut down cleanly. Running pacemaker 1.06 from the tip as of a month ago or so. I'm also enclosing two sets of files which may help you see whats happening. The working set: 4rsc_worlds_coloc_ordered.xml - this is my initial configuration file. When I use this to initial my cluster, the 4 resources all start up in order, on the right node, and move together when I put nodes in and out of standby. goodconfig_debug.txt - the log file (from ubuntu_1) showing what happens when the resources are running on node ubuntu_2 and I put that node into standby. All resources are moved to ubuntu_1. If I stop openais, everything shuts down quickly and clean, and no processes (like lrmd, pengine, etc) are left running. The not working set: Can you attach /var/lib/pengine/pe-input-12434.bz2 from ubuntu_1 please? 4rsc_worlds_coloc_ordered_alt1.xml - this is identical to the xml file in the working set, except I use the compact syntax for ordering. badconfig_debug.txt - the log file (from ubuntu_1) showing what happens when the resources are running on node ubuntu_2 and I put that node into standby. The pe wants to move them to ubuntu_1, but the pe only seems to generate pseudo actions and never really moves anything. The resources continue to run on node ubuntu_2 even when the node is in standby! Further, if I try to shut down openais on ubuntu_2 at this point (using the /etc/init.d/openais script enclosed), after a long time, corosync stops, but lrmd and pengine keep running, and become children of the init process. Again, the resources keep running even at this point, which is because they are never commanded to stop. I can send you my RA's and the resources themselves (which are just bash scripts) if you'd like. I'll apply the patch you pointed to and let you know what happens. Thanks very much, -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Wednesday, December 02, 2009 6:00 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I'm experimenting with startup sequence and co-location control, and think I may have stumbled across a bug. I have two xml files that I use in my testing as my initial configuration of a two node cluster. I start each node with no configuration, and then use cibadmin to source in the xml file. Each file defines two resources as well as a startup order and collocation definition. The only difference between the two files is the syntax I use to specify the startup order. When I use the syntax: rsc_order id=order-1 first=world1 then=world2 score=INFINITY / Everything works fine. I can put either of the two nodes into standby while resources are running there, and the resources move to the other node as expected. However, when I use the syntax: - rsc_order id=order-1 You're missing a score. Without one it defaults to 0 (which means optional). However, IIRC, the 1.0.6 schema won't allow you to set a score there so you'll need to apply the following patch: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c - resource_set id=order-1-set-1 sequential=true resource_ref id=world1 / resource_ref id=world2 / /resource_set /rsc_order Several bad things happen. First, the resources don't move off the node that is put into standby, even though the alternate node is running and able to run the resources. Did you remove the other ordering constraint first? Second, attempting to shut down openais on the node running the resources after attempting a forced move (by putting the node into standby) leaves both the lrmd and pengine processes running (but children of process 1 (init), and the resources continue to run on the that node even after openais is stopped. I suspect you've a faulty init script there. See other email. I turned debug on in crmd and in the logs and recorded what happens when I force standby, and I notice that using the first syntax causes te_rsc_command to be executed to send a shut down message to the node where the resources are running (which seems to work), while using the second syntax causes te_pseudo_action to be called in approximately the same place in the log, but no shutdown of resources happens (I can't really
Re: [Pacemaker] bug in ordering syntax?
Here's are file after bunzip2'ing. 12663 is for the 2 resource case, 12644 is the 4 resource case. Thanks, -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, December 03, 2009 3:45 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. Yes, this looks to be the problem. But I need the bz2 file to make any progress. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker pe-input-12663 Description: pe-input-12663 pe-input-12644 Description: pe-input-12644 ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
I made a change to unpack_order_set that seems to fix the problem. I'm not sure my logic is 100% correct, but I thought I'd pass it along anyway. -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, December 03, 2009 3:45 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. Yes, this looks to be the problem. But I need the bz2 file to make any progress. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker constraints.c Description: constraints.c ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
After looking at some output of ptest, I'm really unsure of my fix ;) I think I'm hunting in the right area though. -Frank -Original Message- From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: Thursday, December 03, 2009 5:06 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? I made a change to unpack_order_set that seems to fix the problem. I'm not sure my logic is 100% correct, but I thought I'd pass it along anyway. -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Thursday, December 03, 2009 3:45 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. Yes, this looks to be the problem. But I need the bz2 file to make any progress. ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I’m experimenting with startup sequence and co-location control, and think I may have stumbled across a bug. I have two xml files that I use in my testing as my initial configuration of a two node cluster. I start each node with no configuration, and then use cibadmin to “source in” the xml file. Each file defines two resources as well as a startup order and collocation definition. The only difference between the two files is the syntax I use to specify the startup order. When I use the syntax: rsc_order id=order-1 first=world1 then=world2 score=INFINITY / Everything works fine. I can put either of the two nodes into standby while resources are running there, and the resources move to the other node as expected. However, when I use the syntax: - rsc_order id=order-1 You're missing a score. Without one it defaults to 0 (which means optional). However, IIRC, the 1.0.6 schema won't allow you to set a score there so you'll need to apply the following patch: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c - resource_set id=order-1-set-1 sequential=true resource_ref id=world1 / resource_ref id=world2 / /resource_set /rsc_order Several bad things happen. First, the resources don’t move off the node that is put into standby, even though the alternate node is running and able to run the resources. Did you remove the other ordering constraint first? Second, attempting to shut down openais on the node running the resources after attempting a forced move (by putting the node into standby) leaves both the lrmd and pengine processes running (but children of process 1 (init), and the resources continue to run on the that node even after openais is stopped. I suspect you've a faulty init script there. See other email. I turned debug on in crmd and in the logs and recorded what happens when I force standby, and I notice that using the first syntax causes te_rsc_command to be executed to send a shut down message to the node where the resources are running (which seems to work), while using the second syntax causes te_pseudo_action to be called in approximately the same place in the log, but no shutdown of resources happens (I can’t really tell what this is supposed to be doing). Neither can I - you didnt attach the logs :-) ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Re: [Pacemaker] bug in ordering syntax?
I turned up the logging level in the pengine during processing of the rsc_order section. This shows the loop being formed between world2 and world1 resources, but only for stopping, not for starting. -Frank -Original Message- From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: Wednesday, December 02, 2009 2:59 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? Here's a two resource version of the same issue. It's easy to see the loop here. -Frank -Original Message- From: Frank DiMeo [mailto:frank.di...@bigbandnet.com] Sent: Wednesday, December 02, 2009 2:13 PM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? Here's the output of ptest for the pe-input-***.bz2 file that's created when I put ubuntu_2 into standby and the cluster tries to move my 4 resources from ubuntu_2 to ubuntu_1 (while running the compact ordering syntax with a score of INFINITY). I've converted it to a .png for your viewing pleasure. -Frank -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Wednesday, December 02, 2009 6:00 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] bug in ordering syntax? On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo frank.di...@bigbandnet.com wrote: I'm experimenting with startup sequence and co-location control, and think I may have stumbled across a bug. I have two xml files that I use in my testing as my initial configuration of a two node cluster. I start each node with no configuration, and then use cibadmin to source in the xml file. Each file defines two resources as well as a startup order and collocation definition. The only difference between the two files is the syntax I use to specify the startup order. When I use the syntax: rsc_order id=order-1 first=world1 then=world2 score=INFINITY / Everything works fine. I can put either of the two nodes into standby while resources are running there, and the resources move to the other node as expected. However, when I use the syntax: - rsc_order id=order-1 You're missing a score. Without one it defaults to 0 (which means optional). However, IIRC, the 1.0.6 schema won't allow you to set a score there so you'll need to apply the following patch: http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c - resource_set id=order-1-set-1 sequential=true resource_ref id=world1 / resource_ref id=world2 / /resource_set /rsc_order Several bad things happen. First, the resources don't move off the node that is put into standby, even though the alternate node is running and able to run the resources. Did you remove the other ordering constraint first? Second, attempting to shut down openais on the node running the resources after attempting a forced move (by putting the node into standby) leaves both the lrmd and pengine processes running (but children of process 1 (init), and the resources continue to run on the that node even after openais is stopped. I suspect you've a faulty init script there. See other email. I turned debug on in crmd and in the logs and recorded what happens when I force standby, and I notice that using the first syntax causes te_rsc_command to be executed to send a shut down message to the node where the resources are running (which seems to work), while using the second syntax causes te_pseudo_action to be called in approximately the same place in the log, but no shutdown of resources happens (I can't really tell what this is supposed to be doing). Neither can I - you didnt attach the logs :-) ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker pengine_debug.log Description: pengine_debug.log ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker