Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Andreas Mock
> -Ursprüngliche Nachricht- > Von: "Joe Armstrong" > Gesendet: 14.05.09 18:01:11 > An: "'pacema...@clusterlabs.org'" > Betreff: Re: [Pacemaker] STONITH Deathmatch Explained > I agree - a nice read. You might want to also add a possibility to avoid the > situation. Don't allow heartbe

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Ivars Strazdiņš
Thank you Andrew and Eliot, the cluster went silent for now. Kind regards, Ivars Andrew Beekhof wrote: 2009/5/14 Ivars Strazdiņš : Hi there, could anyone enlighten me why in a two node cluster one (and only one) node is spitting these messages (below) regularly every 15 minutes? Its t

Re: [Pacemaker] Drbd disk don't run

2009-05-14 Thread Rafael Emerick
Hi, Dejan There is no two set of meta-attributes. I remove the ms-drbd11, add again and the error is the same: Error performing operation: Required data for this CIB API call not found Thanks, On Thu, May 14, 2009 at 3:43 PM, Dejan Muhamedagic wrote: > Hi, > > On Thu, May 14, 2009 at 03:18:1

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi guys, sooo I've got valgrind grinding:) I had some trouble getting the latest stuff working, so I used heartbeat-2.99.2 with Dejan's (fixed) patch and --enable-valgrind --with-valgrind-log="--log-file=/tmp/crm-%p.valgrind" and recompiled pacemaker-1.0.3 (withount openais as Andrew suggested).

Re: [Pacemaker] Drbd disk don't run

2009-05-14 Thread Dejan Muhamedagic
Hi, On Thu, May 14, 2009 at 03:18:15PM -0300, Rafael Emerick wrote: > Hi, > > I'm tryng to make a cluster with xen-ha using drbd and ocfs2... > > I want that crm management all resources (xen machines, drbd disks and ocfs2 > filesystem ). > > First, a create a clone lsb resource to init drbd wi

[Pacemaker] Drbd disk don't run

2009-05-14 Thread Rafael Emerick
Hi, I'm tryng to make a cluster with xen-ha using drbd and ocfs2... I want that crm management all resources (xen machines, drbd disks and ocfs2 filesystem ). First, a create a clone lsb resource to init drbd with gui interface. Now, I'm following this manual http://clusterlabs.org/wiki/DRBD_How

Re: [Pacemaker] New patch for System Health feature

2009-05-14 Thread Mark Hamzy
and...@beekhof.net wrote on 05/13/2009 21:08:36 PM: > This is missing the modification to char2score that i mentioned (which > would also simplify calculate_system_health()). > ... > Oh, and initialize_health_value() should probably just set a something > in data_set (which would be passed to cha

Re: [Pacemaker] xml passes crm_verify but fails cibadmin --replace

2009-05-14 Thread Joe Armstrong
Answering my own question here... I had forgotten to add the tag. Still odd that crm_verify passed though... Joe -Original Message- From: Joe Armstrong Sent: Thursday, May 14, 2009 7:35 AM To: pacema...@clusterlabs.org Subject: [Pacemaker] xml passes crm_verify but fails cibadmin --r

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Andrew Beekhof
2009/5/14 Ivars Strazdiņš : > Hi there, > could anyone enlighten me why in a two node cluster one (and only one) node > is spitting these messages (below) regularly every 15 minutes? Its to facilitate time-based rules and the expiration of resource failures. You can disable it with: crm_attribu

Re: [Pacemaker] Can't get all my resources online at once ?

2009-05-14 Thread Karl Katzke
Joe: The document you're looking for is "Configuration 1.0 explained" at this link: http://www.clusterlabs.org/wiki/Documentation --- Karl Katzke Systems Analyst II TAMU - DRGS >>> Joe Armstrong 5/14/2009 10:34 AM >>> - Original Message - > From: Dejan Muhamedagic > To: pacem

Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Joe Armstrong
>On Thu, May 14, 2009 at 06:32:00PM +1000, Tim Serong wrote: >> Greetings, >> >> I've written up a brief document entitled "STONITH Deathmatch Explained >> (and Some Hints for Resource Agent Authors and Systems Engineers)": >> >> http://ourobengr.com/ha >> >> It's a description of causes of ST

Re: [Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Eliot Gable
Most likely because you have a cluster-recheck-interval="15m" specified. Eliot Gable Senior Engineer 1228 Euclid Ave, Suite 390 Cleveland, OH 44115 Direct: 216-373-4808 Fax: 216-373-4657 ega...@broadvox.net CONFIDENTIAL COMMUNICATION. This e-mail and any files transmitted with it are confide

Re: [Pacemaker] Can't get all my resources online at once ?

2009-05-14 Thread Joe Armstrong
- Original Message - > From: Dejan Muhamedagic > To: pacema...@clusterlabs.org , Joe Armstrong > > Sent: Thursday, May 14, 2009 2:03:31 AM GMT-0800 America;Los_Angeles > Subject: Re: [Pacemaker] Can't get all my resources online at once ? > > Remove the *-action attributes from the ord

[Pacemaker] PEngine Recheck Timer message every 15 minutes - why?

2009-05-14 Thread Ivars Strazdiņš
Hi there, could anyone enlighten me why in a two node cluster one (and only one) node is spitting these messages (below) regularly every 15 minutes? I did erase all configuration with 'cibadmin -E' then recreated manually - yet they are still coming. Is is something to do with default cluster se

[Pacemaker] xml passes crm_verify but fails cibadmin --replace

2009-05-14 Thread Joe Armstrong
Any ideas on how I can get more information on the reason --replace fails (running it with --verbose already). Thanks. Joe ... The base xml file came from "cibadmin --query > new.xml" then I manually added the resources section [r...@vm2 ~]# cat new.xml

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Andrew Beekhof
On Thu, May 14, 2009 at 3:58 PM, Nikola Ciprich wrote: > Hi, > Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker > keeps segfaulting > with it, and unable to rebuild pacemaker with this heartbeat to get the > -debug package. > compilation fails with: > > plugin.c: In

Re: [Pacemaker] cib still leaks in pacemaker-1.0.3

2009-05-14 Thread Nikola Ciprich
Hi, Dejan, thanks a lot, I compiled Your version, but crmd with shipped pacemaker keeps segfaulting with it, and unable to rebuild pacemaker with this heartbeat to get the -debug package. compilation fails with: plugin.c: In function 'check_message_sanity': plugin.c:1190: warning: format '%d' ex

Re: [Pacemaker] Newbie question

2009-05-14 Thread Mark Schenk
Thanks for the pointer Andrew. It turns out I needed to add these statements: location mysql1fencing fencing 200: mysql1 location mysql2fencing fencing 200: mysql2 So as to allow the fencing to actually run on these nodes (I am running with cluster-option symmetric-cluster="false", but I forgo

Re: [Pacemaker] Newbie question

2009-05-14 Thread Dominik Klein
Sorry, I misunderstood your question. When you said "pull the plug" i thought of the network connection and that is what pingd could help you with. If you pull the power plug, you shoud probably look into what beekhof told you. Sorry again, Dominik Dominik Klein wrote: > Hi Mark > > The keyword

Re: [Pacemaker] Newbie question

2009-05-14 Thread Dominik Klein
Hi Mark The keyword you're looking for is "pingd". This example should get you going: http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node Regards Dominik Mark Schenk wrote: > Hello All, > > I'm new to pacemaker so please forgive m

Re: [Pacemaker] Newbie question

2009-05-14 Thread Andrew Beekhof
Try: crm_verify -V It should give you a warning about stonith. On Thu, May 14, 2009 at 2:36 PM, Mark Schenk wrote: > Hello All, > >   I'm new to pacemaker so please forgive me if this is in a faq somewhere, I > haven't been able to find it! > I am trying to set up a failover config for mysql

[Pacemaker] xml passes crm_verify but fails cibadmin --replace

2009-05-14 Thread Joe Armstrong
Any ideas on how I can get more information on the reason --replace fails (running it with --verbose already). Thanks. Joe [r...@vm2 ~]# cat new.xml

[Pacemaker] Newbie - problem creating rsc_order constraint

2009-05-14 Thread Joe Armstrong
I am getting the following error from crm_verify: new.xml:24: element rsc_order: Relax-NG validity error : Expecting an element resource_set, got nothing new.xml:24: element rsc_order: Relax-NG validity error : Element rsc_order failed to validate content new.xml:24: element rsc_order: Relax-NG

[Pacemaker] Newbie question

2009-05-14 Thread Mark Schenk
Hello All, I'm new to pacemaker so please forgive me if this is in a faq somewhere, I haven't been able to find it! I am trying to set up a failover config for mysql using the following setup: primitive mysqlvip ocf:heartbeat:IPaddr params ip="xx.xx.xx.xx" primitive mysqlfs ocf:heartbeat:File

Re: [Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Dejan Muhamedagic
Hi, On Thu, May 14, 2009 at 06:32:00PM +1000, Tim Serong wrote: > Greetings, > > I've written up a brief document entitled "STONITH Deathmatch Explained > (and Some Hints for Resource Agent Authors and Systems Engineers)": > > http://ourobengr.com/ha > > It's a description of causes of STONIT

Re: [Pacemaker] Can't get all my resources online at once ?

2009-05-14 Thread Dejan Muhamedagic
Hi, On Wed, May 13, 2009 at 04:02:30PM -0700, Joe Armstrong wrote: > Hi, > > I have a really simple config - 1 IP and an httpd but I can't > get them both online on the same node. > I have had the httpd try to start on the node without the IP, > I've had the IP get configured correctly but no hi

[Pacemaker] STONITH Deathmatch Explained

2009-05-14 Thread Tim Serong
Greetings, I've written up a brief document entitled "STONITH Deathmatch Explained (and Some Hints for Resource Agent Authors and Systems Engineers)": http://ourobengr.com/ha It's a description of causes of STONITH deathmatch in Heartbeat/Pacemaker HA clusters, where two nodes continually shoo