Re: [Pacemaker] bug in ordering syntax?

2009-12-08 Thread Andrew Beekhof
Could you attach /var/lib/pengine/pe-input-12434.bz2 instead please?
Thats the one I'd been debugging.

On Thu, Dec 3, 2009 at 2:33 PM, Frank DiMeo frank.di...@bigbandnet.com wrote:
 Here's are file after bunzip2'ing.  12663 is for the 2 resource case, 12644 
 is the 4 resource case.

 Thanks,
 -Frank

 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: Thursday, December 03, 2009 3:45 AM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?

 On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo
 frank.di...@bigbandnet.com wrote:
  I turned up the logging level in the pengine during processing of the
 rsc_order section.  This shows the loop being formed between world2 and
 world1 resources, but only for stopping, not for starting.

 Yes, this looks to be the problem.
 But I need the bz2 file to make any progress.

 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-08 Thread Frank DiMeo
Sorry, but I don't have that one anymore.

-Frank

 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: Tuesday, December 08, 2009 3:42 AM
 To: Frank DiMeo
 Cc: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?
 
 Could you attach /var/lib/pengine/pe-input-12434.bz2 instead please?
 Thats the one I'd been debugging.
 
 On Thu, Dec 3, 2009 at 2:33 PM, Frank DiMeo
 frank.di...@bigbandnet.com wrote:
  Here's are file after bunzip2'ing.  12663 is for the 2 resource case,
 12644 is the 4 resource case.
 
  Thanks,
  -Frank
 
  -Original Message-
  From: Andrew Beekhof [mailto:and...@beekhof.net]
  Sent: Thursday, December 03, 2009 3:45 AM
  To: pacemaker@oss.clusterlabs.org
  Subject: Re: [Pacemaker] bug in ordering syntax?
 
  On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo
  frank.di...@bigbandnet.com wrote:
   I turned up the logging level in the pengine during processing of
 the
  rsc_order section.  This shows the loop being formed between world2
 and
  world1 resources, but only for stopping, not for starting.
 
  Yes, this looks to be the problem.
  But I need the bz2 file to make any progress.
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker
 

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-03 Thread Andrew Beekhof
On Wed, Dec 2, 2009 at 3:22 PM, Frank DiMeo frank.di...@bigbandnet.com wrote:
 Ask and ye shall receive. :)

 I'm enclosing my openais init script, which I'm running on my two node 
 cluster made up of identical Ubuntu (9.04) machines called ubuntu_2 and 
 ubuntu_1.

If the node takes more than 30s to shut down, then it kills openais.
In that case, its no surprise that the lrmd and pengine are still
around - because the cluster didn't have time to shut down cleanly.

 Running pacemaker 1.06 from the tip as of a month ago or so.

 I'm also enclosing two sets of files which may help you see whats happening.

 The working set:

 4rsc_worlds_coloc_ordered.xml - this is my initial configuration file.  When 
 I use this to initial my cluster, the 4 resources all start up in order, on 
 the right node, and move together when I put nodes in and out of standby.

 goodconfig_debug.txt - the log file (from ubuntu_1) showing what happens when 
 the resources are running on node ubuntu_2 and I put that node into 
 standby.  All resources are moved to ubuntu_1.  If I stop openais, 
 everything shuts down quickly and clean, and no processes (like lrmd, 
 pengine, etc) are left running.

 The not working set:

Can you attach /var/lib/pengine/pe-input-12434.bz2 from ubuntu_1 please?


 4rsc_worlds_coloc_ordered_alt1.xml - this is identical to the xml file in the 
 working set, except I use the compact syntax for ordering.

 badconfig_debug.txt - the log file (from ubuntu_1) showing what happens when 
 the resources are running on node ubuntu_2 and I put that node into 
 standby.  The pe wants to move them to ubuntu_1, but the pe only seems to 
 generate pseudo actions and never really moves anything.  The resources 
 continue to run on node ubuntu_2 even when the node is in standby!  Further, 
 if I try to shut down openais on ubuntu_2 at this point (using the 
 /etc/init.d/openais script enclosed), after a long time, corosync stops, but 
 lrmd and pengine keep running, and become children of the init process.  
 Again, the resources keep running even at this point, which is because they 
 are never commanded to stop.

 I can send you my RA's and the resources themselves (which are just bash 
 scripts) if you'd like.

 I'll apply the patch you pointed to and let you know what happens.

 Thanks very much,
 -Frank


 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: Wednesday, December 02, 2009 6:00 AM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?

 On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo
 frank.di...@bigbandnet.com wrote:
  I'm experimenting with startup sequence and co-location control, and
 think I
  may have stumbled across a bug.
 
 
 
  I have two xml files that I use in my testing as my initial
 configuration of
  a two node cluster.  I start each node with no configuration, and
 then use
  cibadmin to source in the xml file.  Each file defines two
 resources as
  well as a startup order and collocation definition.  The only
 difference
  between the two files is the syntax I use to specify the startup
 order.
 
 
 
  When I use the syntax:
 
 
 
  rsc_order id=order-1 first=world1 then=world2 score=INFINITY
 /
 
 
 
  Everything works fine.  I can put either of the two nodes into
 standby while
  resources are running there, and the resources move to the other node
 as
  expected.
 
 
 
  However, when I use the syntax:
 
 
 
  - rsc_order id=order-1

 You're missing a score.  Without one it defaults to 0 (which means
 optional).
 However, IIRC, the 1.0.6 schema won't allow you to set a score there
 so you'll need to apply the following patch:
    http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c

 
  -   resource_set id=order-1-set-1 sequential=true
 
        resource_ref id=world1 /
 
        resource_ref id=world2 /
 
    /resource_set
 
   /rsc_order
 
 
 
 
 
  Several bad things happen.  First, the resources don't move off the
 node
  that is put into standby, even though the alternate node is running
 and able
  to run the resources.

 Did you remove the other ordering constraint first?

  Second, attempting to shut down openais on the node
  running the resources after attempting a forced move (by putting the
 node
  into standby) leaves both the lrmd and pengine processes running (but
  children of process 1 (init), and the resources continue to run on
 the that
  node even after openais is stopped.

 I suspect you've a faulty init script there.  See other email.

  I turned debug on in crmd and in the logs and recorded what happens
 when I
  force standby, and I notice that using the first syntax causes
  te_rsc_command to be executed to send a shut down message to the node
 where
  the resources are running (which seems to work), while using the
 second
  syntax causes te_pseudo_action to be called in approximately the same
 place
  in the log, but no shutdown of resources happens (I can't really

Re: [Pacemaker] bug in ordering syntax?

2009-12-03 Thread Frank DiMeo
Here's are file after bunzip2'ing.  12663 is for the 2 resource case, 12644 is 
the 4 resource case.

Thanks,
-Frank

 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: Thursday, December 03, 2009 3:45 AM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?
 
 On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo
 frank.di...@bigbandnet.com wrote:
  I turned up the logging level in the pengine during processing of the
 rsc_order section.  This shows the loop being formed between world2 and
 world1 resources, but only for stopping, not for starting.
 
 Yes, this looks to be the problem.
 But I need the bz2 file to make any progress.
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


pe-input-12663
Description: pe-input-12663


pe-input-12644
Description: pe-input-12644
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-03 Thread Frank DiMeo
I made a change to unpack_order_set that seems to fix the problem.  I'm not 
sure my logic is 100% correct, but I thought I'd pass it along anyway.

-Frank

 -Original Message-
 From: Andrew Beekhof [mailto:and...@beekhof.net]
 Sent: Thursday, December 03, 2009 3:45 AM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?
 
 On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo
 frank.di...@bigbandnet.com wrote:
  I turned up the logging level in the pengine during processing of the
 rsc_order section.  This shows the loop being formed between world2 and
 world1 resources, but only for stopping, not for starting.
 
 Yes, this looks to be the problem.
 But I need the bz2 file to make any progress.
 
 ___
 Pacemaker mailing list
 Pacemaker@oss.clusterlabs.org
 http://oss.clusterlabs.org/mailman/listinfo/pacemaker


constraints.c
Description: constraints.c
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-03 Thread Frank DiMeo
After looking at some output of ptest, I'm really unsure of my fix ;)  I 
think I'm hunting in the right area though.

-Frank

 -Original Message-
 From: Frank DiMeo [mailto:frank.di...@bigbandnet.com]
 Sent: Thursday, December 03, 2009 5:06 PM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?
 
 I made a change to unpack_order_set that seems to fix the problem.  I'm
 not sure my logic is 100% correct, but I thought I'd pass it along
 anyway.
 
 -Frank
 
  -Original Message-
  From: Andrew Beekhof [mailto:and...@beekhof.net]
  Sent: Thursday, December 03, 2009 3:45 AM
  To: pacemaker@oss.clusterlabs.org
  Subject: Re: [Pacemaker] bug in ordering syntax?
 
  On Wed, Dec 2, 2009 at 11:26 PM, Frank DiMeo
  frank.di...@bigbandnet.com wrote:
   I turned up the logging level in the pengine during processing of
   the
  rsc_order section.  This shows the loop being formed between world2
  and
  world1 resources, but only for stopping, not for starting.
 
  Yes, this looks to be the problem.
  But I need the bz2 file to make any progress.
 
  ___
  Pacemaker mailing list
  Pacemaker@oss.clusterlabs.org
  http://oss.clusterlabs.org/mailman/listinfo/pacemaker

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-02 Thread Andrew Beekhof
On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo frank.di...@bigbandnet.com wrote:
 I’m experimenting with startup sequence and co-location control, and think I
 may have stumbled across a bug.



 I have two xml files that I use in my testing as my initial configuration of
 a two node cluster.  I start each node with no configuration, and then use
 cibadmin to “source in” the xml file.  Each file defines two resources as
 well as a startup order and collocation definition.  The only difference
 between the two files is the syntax I use to specify the startup order.



 When I use the syntax:



 rsc_order id=order-1 first=world1 then=world2 score=INFINITY /



 Everything works fine.  I can put either of the two nodes into standby while
 resources are running there, and the resources move to the other node as
 expected.



 However, when I use the syntax:



 - rsc_order id=order-1

You're missing a score.  Without one it defaults to 0 (which means optional).
However, IIRC, the 1.0.6 schema won't allow you to set a score there
so you'll need to apply the following patch:
   http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c


 -   resource_set id=order-1-set-1 sequential=true

       resource_ref id=world1 /

       resource_ref id=world2 /

   /resource_set

  /rsc_order





 Several bad things happen.  First, the resources don’t move off the node
 that is put into standby, even though the alternate node is running and able
 to run the resources.

Did you remove the other ordering constraint first?

 Second, attempting to shut down openais on the node
 running the resources after attempting a forced move (by putting the node
 into standby) leaves both the lrmd and pengine processes running (but
 children of process 1 (init), and the resources continue to run on the that
 node even after openais is stopped.

I suspect you've a faulty init script there.  See other email.

 I turned debug on in crmd and in the logs and recorded what happens when I
 force standby, and I notice that using the first syntax causes
 te_rsc_command to be executed to send a shut down message to the node where
 the resources are running (which seems to work), while using the second
 syntax causes te_pseudo_action to be called in approximately the same place
 in the log, but no shutdown of resources happens (I can’t really tell what
 this is supposed to be doing).

Neither can I - you didnt attach the logs :-)

___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


Re: [Pacemaker] bug in ordering syntax?

2009-12-02 Thread Frank DiMeo
I turned up the logging level in the pengine during processing of the rsc_order 
section.  This shows the loop being formed between world2 and world1 resources, 
but only for stopping, not for starting.

-Frank

 -Original Message-
 From: Frank DiMeo [mailto:frank.di...@bigbandnet.com]
 Sent: Wednesday, December 02, 2009 2:59 PM
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] bug in ordering syntax?
 
 Here's a two resource version of the same issue. It's easy to see the
 loop here.
 
 -Frank
 
  -Original Message-
  From: Frank DiMeo [mailto:frank.di...@bigbandnet.com]
  Sent: Wednesday, December 02, 2009 2:13 PM
  To: pacemaker@oss.clusterlabs.org
  Subject: Re: [Pacemaker] bug in ordering syntax?
 
  Here's the output of ptest for the pe-input-***.bz2 file that's
  created when I put ubuntu_2 into standby and the cluster tries to
 move
  my 4 resources from ubuntu_2 to ubuntu_1 (while running the compact
  ordering syntax with a score of INFINITY).
 
  I've converted it to a .png for your viewing pleasure.
 
  -Frank
 
   -Original Message-
   From: Andrew Beekhof [mailto:and...@beekhof.net]
   Sent: Wednesday, December 02, 2009 6:00 AM
   To: pacemaker@oss.clusterlabs.org
   Subject: Re: [Pacemaker] bug in ordering syntax?
  
   On Mon, Nov 30, 2009 at 9:19 PM, Frank DiMeo
   frank.di...@bigbandnet.com wrote:
I'm experimenting with startup sequence and co-location control,
  and
   think I
may have stumbled across a bug.
   
   
   
I have two xml files that I use in my testing as my initial
   configuration of
a two node cluster.  I start each node with no configuration, and
   then use
cibadmin to source in the xml file.  Each file defines two
   resources as
well as a startup order and collocation definition.  The only
   difference
between the two files is the syntax I use to specify the startup
   order.
   
   
   
When I use the syntax:
   
   
   
rsc_order id=order-1 first=world1 then=world2
  score=INFINITY
   /
   
   
   
Everything works fine.  I can put either of the two nodes into
   standby while
resources are running there, and the resources move to the other
node
   as
expected.
   
   
   
However, when I use the syntax:
   
   
   
- rsc_order id=order-1
  
   You're missing a score.  Without one it defaults to 0 (which means
   optional).
   However, IIRC, the 1.0.6 schema won't allow you to set a score
 there
   so you'll need to apply the following patch:
  http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/c8585629629c
  
   
-   resource_set id=order-1-set-1 sequential=true
   
      resource_ref id=world1 /
   
      resource_ref id=world2 /
   
  /resource_set
   
 /rsc_order
   
   
   
   
   
Several bad things happen.  First, the resources don't move off
the
   node
that is put into standby, even though the alternate node is
running
   and able
to run the resources.
  
   Did you remove the other ordering constraint first?
  
Second, attempting to shut down openais on the node running the
resources after attempting a forced move (by putting the
   node
into standby) leaves both the lrmd and pengine processes running
(but children of process 1 (init), and the resources continue to
  run
on
   the that
node even after openais is stopped.
  
   I suspect you've a faulty init script there.  See other email.
  
I turned debug on in crmd and in the logs and recorded what
happens
   when I
force standby, and I notice that using the first syntax causes
te_rsc_command to be executed to send a shut down message to the
node
   where
the resources are running (which seems to work), while using the
   second
syntax causes te_pseudo_action to be called in approximately the
same
   place
in the log, but no shutdown of resources happens (I can't really
tell
   what
this is supposed to be doing).
  
   Neither can I - you didnt attach the logs :-)
  
   ___
   Pacemaker mailing list
   Pacemaker@oss.clusterlabs.org
   http://oss.clusterlabs.org/mailman/listinfo/pacemaker


pengine_debug.log
Description: pengine_debug.log
___
Pacemaker mailing list
Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker