Re: [Pacemaker] Problem in Stonith configuration

2011-10-17 Thread neha chatrath
Hello Andreas,

Thanks for the reply.

So can you please suggest what Stonith plugin should I use for the
production release of my software. I have the following system requirements:
1. If a node in the cluster fails, it should be reboot and resources should
re-start on the node.
2. If the physical link between the nodes in a cluster fails then that node
should be isolated (kind of a power down) and the resources should continue
to run on the other nodes.

I have different types of resources e.g. primitive, master-slave and cone
running on my system.

Thanks and regards
Neha Chatrath


Date: Mon, 17 Oct 2011 15:08:16 +0200
From: Andreas Kurz 
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Problem in Stonith configuration
Message-ID: <4e9c28c0.8070...@hastexo.com>
Content-Type: text/plain; charset="iso-8859-1"

Hello,

On 10/17/2011 12:34 PM, neha chatrath wrote:
> Hello,
> I am configuring a 2 node cluster with following configuration:
>
> *[root@MCG1 init.d]# crm configure show
>
> node $id="16738ea4-adae-483f-9d79-
b0ecce8050f4" mcg2 \
> attributes standby="off"
>
> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> attributes standby="off"
>
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
>
> op monitor interval="40s" timeout="20s" \
> meta target-role="Started"
>
> primitive app1_fencing stonith:suicide \
> op monitor interval="90" \
> meta target-role="Started"
>
> primitive myapp1 ocf:heartbeat:Redundancy \
> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
>
> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> primitive myapp3 ocf:mcg:red_app3 \
> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
>
> ms ms_myapp1 myapp1 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp2 myapp2 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> ms ms_myapp3 myapp3 \
> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
> notify="true"
>
> colocation myapp1_col inf: ClusterIP ms_myapp1:Master
>
> colocation myapp2_col inf: ClusterIP ms_myapp2:Master
>
> colocation myapp3_col inf: ClusterIP ms_myapp3:Master
>
> order myapp1_order inf: ms_myapp1:promote ClusterIP:start
>
> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
>
> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
>
> property $id="cib-bootstrap-options" \
> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="true" \
> no-quorum-policy="ignore"
>
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="3"
> *
> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
> resources (myapp, myapp1 etc) gets started even on this node.
> Following is the output of "*crm_mon -f *" command:
>
> *Last updated: Mon Oct 17 10:19:22 2011
> Stack: Heartbeat
> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
> quorum
> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> 
> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)

The cluster is waiting for a successful fencing event before starting
all resources .. the only way to be sure the second node runs no resources.

Since you are using suicide pluging this will never happen if Heartbeat
is not started on that node. If this is only a _test_setup_ go with ssh
or even null stonith plugin ... never use them on production systems!

Regards,
Andreas


On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath wrote:

> Hello,
> I am configuring a 2 node cluster with following configuration:
>
> *[root@MCG1 init.d]# crm configure show
>
> node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \
> attributes standby="off"
>
> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> attributes standby="off"
>
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
>
> op monitor interval="40s" timeout="20s" \
> meta target-role="Started"
>
> primitive app1_fencing stonith:suicide \
> op monitor interval="90" \
> meta target-role="Started"
>
> primitive myapp1 ocf:heartbeat:Redundancy \
> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
>
> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> op monitor interval="40" role="Slave" 

[Pacemaker] pacemaker compatibility

2011-10-17 Thread manish . gupta
Hi,

   I am using corosync.1.2.1. I want  to upgrade  corosync  from 1.2 to
1.4.2.


   please can you let me know which version of cluster-glue and pacemekr
are compatiable with corosync1.4.2

   Currentely with corosync1.4.2 I am using pacemaker 1.0.10 and
cluster-glue1.0.3 and I am getting error ..

   service failed to load pacemaker ...


Regards
Manish


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] [Problem] The attrd does not sometimes stop.

2011-10-17 Thread renayama19661014
Hi,

We sometimes fail in a stop of attrd.

Step1. start a cluster in 2 nodes
Step2. stop the first node.(/etc/init.d/heartbeat stop.)
Step3. stop the second node after time passed a little.(/etc/init.d/heartbeat
stop.)

The attrd catches the TERM signal, but does not stop.

(snip)
Oct  5 02:37:38 hpdb0201 crmd: [12238]: info: do_exit: [crmd] stopped (0)
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_ipc_message: IPC Channel to
12238 is not connected
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: send_via_callback_channel:
Delivery of reply to client 12238/0dbc9e28-d90d-4335-b9c4-9dd3fcb38163 failed
Oct  5 02:37:38 hpdb0201 cib: [12234]: WARN: do_local_notify: A-Sync reply to
crmd failed: reply failed
Oct  5 02:37:38 hpdb0201 heartbeat: [12223]: info: killing
/usr/lib64/heartbeat/attrd process group 12237 with signal 15
Oct  5 02:47:03 hpdb0201 cib: [12234]: info: cib_stats: Processed 97 operations
(4123.00us average, 0% utilization) in the last 10min
Oct  5 07:15:25 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
channel took 1010 ms (> 100 ms)
Oct  5 07:15:26 hpdb0201 ccm: [12233]: WARN: G_CH_check_int: working on IPC
channel took 1010 ms (> 100 ms)
Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:15:37 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431583547 should have started at 431583444
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd27dd0)
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431584254 should have started at 431584151
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:15:44 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431584254 should have started at 431584151
Oct  5 07:16:59 hpdb0201 heartbeat: [12223]: WARN: G_CH_check_int: working on
write child took 1010 ms (> 100 ms)
Oct  5 07:17:14 hpdb0201 stonithd: [12236]: WARN: G_CH_check_int: working on
Heartbeat API channel took 1010 ms (> 100 ms)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for send local status was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd27dd0)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431607988 should have started at 431607885
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: WARN: Gmain_timeout_dispatch:
Dispatch function for check for signals was delayed 1030 ms (> 1010 ms) before
being called (GSource: 0xd28010)
Oct  5 07:19:41 hpdb0201 heartbeat: [12223]: info: Gmain_timeout_dispatch:
started at 431607988 should have started at 431607885
(snip)

We try the reproduction of the phenomenon, but do not reappear very much.

The same phenomenon is reported by the next email.
However, the argument of the problem is over on the way.

 * http://www.gossamer-threads.com/lists/linuxha/pacemaker/62147

The phenomenon occurred by the next combination.
 * pacemaker-1.0.11
 * resource-agents-3.9.2
 * cluster-glue-1.0.7
 * heartbeat-3.0.5

I registered these contents with Bugzilla.
 * http://bugs.clusterlabs.org/show_bug.cgi?id=5004

Best Regards,
Hideo Yamauchi.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] 1) attrd, crmd, cib, stonithd going to 100% CPU after standby 2) monitoring bug 3) meta failure-timeout issue

2011-10-17 Thread Proskurin Kirill

Hello Beekhof.

First of all - I don`t want to waste your time but this problem is realy 
important for me and I can`t solve it by my self and it`s looks like a 
bug or something. I think what I fail at describing of this problem so I 
will try again and try to make a sum of all prev conversation.


I have a situation then pacemaker thinks what resource are running but 
it`s not. Agent from console said it`s not running.

I have no fencing and this resource are fail to stop by timeout.
And you said what it`s a reason of this situation. But I made an 
experiment and found what if pcmk can`t stop resource it make it "unmanaged"


My resource was not "unmanaged" - it`s just say what they are running 
and I have no indication of problem.


We already fix this non stoppable scripts but I want to be sure what I 
will not run on this problem any more.


Below some quotes from prev conversation if needed.

12.10.2011 6:11, Andrew Beekhof пишет:

On 10/03/2011 05:32 AM, Andrew Beekhof wrote:


corosync-1.4.1
pacemaker-1.1.5
pacemaker runs with "ver: 1"



2)
This one is scary.
I twice run on situation then pacemaker thinks what resource is
started
but
it is not.


RA is misbehaving.  Pacemaker will only consider a resource running if
the RA tells us it is (running or in a failed state).


But you can see below, what agent return "7".


Its still broken. Not one stop action succeeds.

Sep 30 13:58:41 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 4082) timed out (try 1).  Killing with
signal SIGTERM (15).
Sep 30 14:09:34 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 21859) timed out (try 1).  Killing
with signal SIGTERM (15).
Sep 30 20:04:17 mysender34.mail.ru lrmd: [26299]: WARN:
tranprocessor:stop process (PID 24576) timed out (try 1).  Killing
with signal SIGTERM (15).

/That/ is why pacemaker thinks its still running.


I made an experiment.

I create script what don`t die at SIGTERM

#!/usr/bin/perl
$SIG{TERM} = "IGNORE"; sleep 1 while 1

And run it on pacemaker.
I run 3 tests:
1) primitive test-kill-15.pl ocf:mail.ru:generic \
op monitor interval="20" timeout="5" on-fail="restart" \
params binfile="/tmp/test-kill-15.pl" external_pidfile="1"

2) Same but on-fail=block

3) Same but with metaware stonith.

Each time I do:
crm resource stop test-kill-15.pl

And in case 1 and 2 - I get "unmanaged" on this resource.


Because you've not configured any fencing devices.



--
Best regards,
Proskurin Kirill

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [Linux-cluster] new to pacemaker and heartbeat on debian...getting error..

2011-10-17 Thread Andreas Kurz
On 10/17/2011 04:02 AM, Joey L wrote:
> Hi - New to heartbeat and pacemaker on debian.
> Followed a tutorial online at:
> http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
> 
> 
> and now getting this error -
> 
> 
> root@deb2:/home/mjh# sudo crm_mon --one-shot
> 
> Last updated: Sun Oct 16 21:56:43 2011
> Stack: openais
> Current DC: deb1 - partition with quorum
> Version: 1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b
> 2 Nodes configured, 2 expected votes
> 1 Resources configured.
> 
> 
> Online: [ deb1 deb2 ]
> 
> 
> Failed actions:
> failover-ip_start_0 (node=deb2, call=35, rc=1, status=complete):
> unknown error
> failover-ip_start_0 (node=deb1, call=35, rc=1, status=complete):
> unknown error
> root@deb2:/home/mjh#
> 

Please provide your config ... best is the ouput of "cibadmin -Q".
Reading the logs should also give you valuable hints.

One shot into the dark: there is no interface up with an IP in the same
subnet you configured your failover-ip and you did not explicitly
defined an interface.

And there is a dedicated Pacemaker mailinglist, I set it on cc for this
thread.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Tried googling - nothing --
> I stopped avahi-daemon because i was getting a strange error:
> I was getting a naming conflict error do not think i am getting it any more.
> 
> Any thoughts on this ?
> Any way i can turn logs on for heartbeat or pacemaker ?
> 
> thanks
> mjh
> 
> --
> Linux-cluster mailing list
> linux-clus...@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster






signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problem in Stonith configuration

2011-10-17 Thread Andreas Kurz
Hello,

On 10/17/2011 12:34 PM, neha chatrath wrote:
> Hello,
> I am configuring a 2 node cluster with following configuration:
> 
> *[root@MCG1 init.d]# crm configure show
> 
> node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \
> attributes standby="off"
> 
> node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
> attributes standby="off"
> 
> primitive ClusterIP ocf:heartbeat:IPaddr \
> params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \
> 
> op monitor interval="40s" timeout="20s" \
> meta target-role="Started"
> 
> primitive app1_fencing stonith:suicide \
> op monitor interval="90" \
> meta target-role="Started"
> 
> primitive myapp1 ocf:heartbeat:Redundancy \
> op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
> op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"
> 
> primitive myapp2 ocf:mcg:Redundancy_myapp2 \
> op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
> 
> primitive myapp3 ocf:mcg:red_app3 \
> op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
> op monitor interval="40" role="Slave" timeout="40" on-fail="restart"
> 
> ms ms_myapp1 myapp1 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
> 
> ms ms_myapp2 myapp2 \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
> notify="true"
> 
> ms ms_myapp3 myapp3 \
> meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
> notify="true"
> 
> colocation myapp1_col inf: ClusterIP ms_myapp1:Master
> 
> colocation myapp2_col inf: ClusterIP ms_myapp2:Master
> 
> colocation myapp3_col inf: ClusterIP ms_myapp3:Master
> 
> order myapp1_order inf: ms_myapp1:promote ClusterIP:start
> 
> order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
> 
> order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
> 
> property $id="cib-bootstrap-options" \
> dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="true" \
> no-quorum-policy="ignore"
> 
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100" \
> migration-threshold="3"
> *
> I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
> resources (myapp, myapp1 etc) gets started even on this node.
> Following is the output of "*crm_mon -f *" command:
> 
> *Last updated: Mon Oct 17 10:19:22 2011
> Stack: Heartbeat
> Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
> quorum
> Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
> 2 Nodes configured, unknown expected votes
> 5 Resources configured.
> 
> Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)

The cluster is waiting for a successful fencing event before starting
all resources .. the only way to be sure the second node runs no resources.

Since you are using suicide pluging this will never happen if Heartbeat
is not started on that node. If this is only a _test_setup_ go with ssh
or even null stonith plugin ... never use them on production systems!

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] ldap: Failed actions ldap_monitor_0

2011-10-17 Thread Andreas Kurz
Hello,

On 10/16/2011 02:41 AM, ge...@riseup.net wrote:
> Hello,
> 
>> Check the slapd script for LSB clompliance ... there is also a brief
>> description in the Pacemaker docs:
> 
> Allright, thanks.
> 
>> Is there a specific reason why you are using a cluster filesystem, with
>> single primary DRBD setup, no fencing configured ... ?
> 
> Not really. I read some howtos, and used this filesystems 'cause people
> were using this. And I thought a cluster-fs is more "stable" for this kind
> of use, which sounds maybe a bit stupid (like the first reason...). Should
> I enable fencing, or is the filesystem a "no go" at all for this kind of
> use?

If you don't know why you need a cluster fs, don't use it ;-) Your setup
will work fine with a common fs and enable fencing if there is any kind
of shared storage involved ... in fact, best practice is to always
enable fencing.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> Georg
> 
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker





signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Problem in Stonith configuration

2011-10-17 Thread neha chatrath
Hello,
I am configuring a 2 node cluster with following configuration:

*[root@MCG1 init.d]# crm configure show

node $id="16738ea4-adae-483f-9d79-b0ecce8050f4" mcg2 \
attributes standby="off"

node $id="3d507250-780f-414a-b674-8c8d84e345cd" mcg1 \
attributes standby="off"

primitive ClusterIP ocf:heartbeat:IPaddr \
params ip="192.168.1.204" cidr_netmask="255.255.255.0" nic="eth0:1" \

op monitor interval="40s" timeout="20s" \
meta target-role="Started"

primitive app1_fencing stonith:suicide \
op monitor interval="90" \
meta target-role="Started"

primitive myapp1 ocf:heartbeat:Redundancy \
op monitor interval="60s" role="Master" timeout="30s" on-fail="standby" \
op monitor interval="40s" role="Slave" timeout="40s" on-fail="restart"

primitive myapp2 ocf:mcg:Redundancy_myapp2 \
op monitor interval="60" role="Master" timeout="30" on-fail="standby" \
op monitor interval="40" role="Slave" timeout="40" on-fail="restart"

primitive myapp3 ocf:mcg:red_app3 \
op monitor interval="60" role="Master" timeout="30" on-fail="fence" \
op monitor interval="40" role="Slave" timeout="40" on-fail="restart"

ms ms_myapp1 myapp1 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"

ms ms_myapp2 myapp2 \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1"
notify="true"

ms ms_myapp3 myapp3 \
meta master-max="1" master-max-node="1" clone-max="2" clone-node-max="1"
notify="true"

colocation myapp1_col inf: ClusterIP ms_myapp1:Master

colocation myapp2_col inf: ClusterIP ms_myapp2:Master

colocation myapp3_col inf: ClusterIP ms_myapp3:Master

order myapp1_order inf: ms_myapp1:promote ClusterIP:start

order myapp2_order inf: ms_myapp2:promote ms_myapp1:start

order myapp3_order inf: ms_myapp3:promote ms_myapp2:start

property $id="cib-bootstrap-options" \
dc-version="1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="true" \
no-quorum-policy="ignore"

rsc_defaults $id="rsc-options" \
resource-stickiness="100" \
migration-threshold="3"
*
I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
resources (myapp, myapp1 etc) gets started even on this node.
Following is the output of "*crm_mon -f *" command:

*Last updated: Mon Oct 17 10:19:22 2011
Stack: Heartbeat
Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
quorum
Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
2 Nodes configured, unknown expected votes
5 Resources configured.

Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)
Online: [ mcg1 ]
app1_fencing(stonith:suicide):Started mcg1

Migration summary:
* Node mcg1:
*
When I set "stonith_enabled" as false, then all my resources comes up.

Can somebody help me with STONITH configuration?

Cheers
Neha Chatrath
  KEEP SMILING
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] [PATCH] Building on Centos 5.7 fails

2011-10-17 Thread Trevor Hemsley

Andrew Beekhof wrote:

On Sat, Oct 15, 2011 at 1:46 AM, Trevor Hemsley  wrote:
  

Hi

I've just built pacemaker 1.1.6 on CentOS 5.7 with the aid of the attached
patches. Not sure if the methods I've used are correct but it does now build
and produce a series of RPMs which is more than it did before.

Files affected:
configure.ac - had to rename PKG_FEATURES -> PCMK_FEATURES to avoid an
autoconf error. Not sure what changed between 1.1.5 and .6 but this was OK
before.
doc/Makefile.am - added the -L option to a2x to avoid an error about being
unable to fetch the dtd file for crm.8. Am more surprised that it works on
other distros since the xmllint that a2x calls specifies --nonet
pacemaker.spec diff to pull in the patch and to define other stuff that
doesn't exist on el5 like _initddir and override the docdir. Also,
./configure on el5 does not like --docdir=



I've applied the rest, but the spec file part looks odd.  The original
doesn't seem to match what was in Git.
  
That was a diff against the Fedora 15 spec file. Maybe that's different 
to the one in git.


--
Trevor
Voiceflex
www.voiceflex.com



___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker