[Pacemaker] Promote of one resource leads to start of another resource in heartbeat cluster

2012-03-19 Thread neha chatrath
Hello,
I have the following 2 node cluster configuration:

node $id=15f8a22d-9b1a-4ce3-bca2-05f654a9ed6a cps2 \
attributes standby=off
node $id=d3088454-5ff3-4bcd-b94c-5a2567e2759b cps1 \
attributes standby=off
primitive CPS ocf:heartbeat:jboss_cps \
params jboss_home=/home/cluster/cps/jboss-5.1.0.GA/
java_home=/usr/ run_opts=-c all -b 0.0.0.0 -g clusterCPS
-Djboss.service.binding.set=ports-01 -Djboss.messaging.ServerPeerID=01
statusurl=http://127.0.0.1:8180; shutdown_opts=-s 127.0.0.1:1199
pstring=clusterCPS \
op start interval=0 timeout=150 \
op stop interval=0 timeout=240 \
op monitor interval=30s timeout=40s
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=192.168.114.150 cidr_netmask=32 nic=bond0:114:1 \
op monitor interval=40 timeout=20 \
meta target-role=Started
primitive EMS ocf:heartbeat:jboss \
params jboss_home=/home/cluster/cps/Jboss_EMS/jboss-5.1.0.GA
java_home=/usr/ run_opts=-c all -b 0.0.0.0 -g clusterEMS
pstring=clusterEMS \
op start interval=0 timeout=60 \
op stop interval=0 timeout=240 \
op monitor interval=30s timeout=40s
primitive LB ocf:ptt:lb_ptt \
op monitor interval=40
primitive NDB_MGMT ocf:ptt:NDB_MGM_RA \
op monitor interval=120 timeout=120
primitive NDB_VIP ocf:heartbeat:IPaddr2 \
params ip=192.168.117.150 cidr_netmask=255.255.255.255
nic=bond0.117:4 \
op monitor interval=30 timeout=25
primitive Rmgr ocf:ptt:RM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40
on-fail=restart \
op start interval=0 role=Master timeout=30 \
op start interval=0 role=Slave timeout=35
primitive mysql ocf:ptt:MYSQLD_RA \
op monitor interval=180 timeout=200 \
op start interval=0 timeout=40
primitive ndbd ocf:ptt:NDBD_RA \
op monitor interval=120 timeout=120
ms CPS_CLONE CPS \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 interleave=true notify=true
ms ms_Rmgr Rmgr \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 interleave=true notify=true target-role=Started
ms ms_mysqld mysql \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 interleave=true notify=true
clone EMS_CLONE EMS \
meta globally-unique=false clone-max=2 clone-node-max=1
clone LB_CLONE LB \
meta globally-unique=false clone-max=2 clone-node-max=1
target-role=Started
clone ndbdclone ndbd \
meta globally-unique=false clone-max=2 clone-node-max=1
colocation RM_with_ip inf: ms_Rmgr:Master ClusterIP
colocation ndb_vip-with-ndb_mgm inf: NDB_MGMT NDB_VIP
order RM-after-ip inf: ClusterIP ms_Rmgr
order cps-after-mysqld inf: ms_mysqld CPS_CLONE
order ip-after-mysqld inf: ms_mysqld ClusterIP
order lb-after-cps inf: CPS_CLONE LB_CLONE
order mysqld-after-ndbd inf: ndbdclone ms_mysqld
order ndb_mgm-after-ndb_vip inf: NDB_VIP NDB_MGMT
order ndbd-after-ndb_mgm inf: NDB_MGMT ndbdclone
property $id=cib-bootstrap-options \
dc-version=1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8 \
   cluster-infrastructure=Heartbeat \
no-quorum-policy=ignore \
stonith-enabled=false
rsc_defaults $id=rsc-options \
resource-stickiness=100 \
migration_threshold=3

When I brig down the active node in the cluster, ms_mysqld resource on the
standby node is promoted but another resource (ms_Rmgr) gets re-started.

Following are excerpts form the logs:

Mar 19 18:09:58 CPS2 lrmd: [27576]: info: operation monitor[13] on NDB_VIP
for client 27579: pid 29532 exited with return code 0
Mar 19 18:10:06 CPS2 heartbeat: [27565]: WARN: node cps1: is dead
Mar 19 18:10:06 CPS2 heartbeat: [27565]: info: Link cps1:bond0.115 dead.
Mar 19 18:10:06 CPS2 ccm: [27574]: debug: recv msg status from cps1,
status:dead
Mar 19 18:10:06 CPS2 ccm: [27574]: debug: status of node cps1: active -
dead
Mar 19 18:10:06 CPS2 ccm: [27574]: debug: recv msg CCM_TYPE_LEAVE from
cps1, status:[null ptr]
Mar 19 18:10:06 CPS2 ccm: [27574]: debug: quorum plugin: majority
Mar 19 18:10:06 CPS2 crmd: [27579]: notice: crmd_ha_status_callback: Status
update: Node cps1 now has status [dead] (DC=true)
Mar 19 18:10:06 CPS2 ccm: [27574]: debug: cluster:linux-ha, member_count=1,
member_quorum_votes=100
Mar 19 18:10:06 CPS2 crmd: [27579]: info: crm_update_peer_proc: cps1.ais is
now offline
...
..

Mar 19 18:10:07 CPS2 pengine: [27584]: notice: LogActions: Start
ClusterIP(cps2)
Mar 19 18:10:07 CPS2 pengine: [27584]: notice: LogActions: Leave   resource
NDB_VIP (Started cps2)
Mar 19 18:10:07 CPS2 pengine: [27584]: notice: LogActions: Leave   resource
NDB_MGMT(Started cps2)
Mar 19 18:10:07 CPS2 pengine: [27584]: notice: LogActions: Leave   resource
ndbd:0  (Stopped)
Mar 19 18:10:07 CPS2 pengine: [27584]: notice: LogActions: Leave   resource
ndbd:1  (Started cps2)
Mar 19 18:10:07 CPS2 pengine: [27584]: notice: 

Re: [Pacemaker] How to run heartbeat and pacemaker resources as a non-root user

2012-02-20 Thread neha chatrath
Hello,

Thanks for the reply.
I have been successfully using Heartbeat as a root user.
But I have a system requirement for which I need to run my different custom
applications  (configured using crm)  as a non root user.
Can this be done?

Regards
Neha Chatrath

Date: Mon, 20 Feb 2012 22:05:30 +1100
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] How to run heartbeat and pacemaker resources
   as a non-root user
Message-ID:
   caedlwg2ok25f4jrg8y0kwsgc6n35_bzzdy6np+egk0tutjg...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Mon, Feb 20, 2012 at 2:39 PM, neha chatrath nehachatr...@gmail.com
wrote:
 Hello,

 I need to run heartbeat and pacemaker resources as non-root users.
 When I try to run heartbeat as a hacluster user,

That probably wont work.  We already try to drop as much privilege as
we can, but some processes need to be root or that can't do anything -
like add an IP address to a machine.

 it fails to run with the
 following error:

 Starting High-Availability services: chmod: changing permissions of
 `/var/run/heartbeat/rsctmp': Operation not permitted
 Done. touch: cannot touch `/var/lock/subsys/heartbeat': Permission denied

 I have tried changing ownership and permissions for the above directories
 and files but still the same result.

 Can somebody help me in this?

 Thanks and regards
 Neha Chatrath


On Mon, Feb 20, 2012 at 9:09 AM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 I need to run heartbeat and pacemaker resources as non-root users.
 When I try to run heartbeat as a hacluster user, it fails to run with
 the following error:

 Starting High-Availability services: chmod: changing permissions of
 `/var/run/heartbeat/rsctmp': Operation not permitted
 Done. touch: cannot touch `/var/lock/subsys/heartbeat': Permission denied

 I have tried changing ownership and permissions for the above directories
 and files but still the same result.

 Can somebody help me in this?

 Thanks and regards
 Neha Chatrath


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] How to run heartbeat and pacemaker resources as a non-root user

2012-02-19 Thread neha chatrath
Hello,

I need to run heartbeat and pacemaker resources as non-root users.
When I try to run heartbeat as a hacluster user, it fails to run with the
following error:

Starting High-Availability services: chmod: changing permissions of
`/var/run/heartbeat/rsctmp': Operation not permitted
Done. touch: cannot touch `/var/lock/subsys/heartbeat': Permission denied

I have tried changing ownership and permissions for the above directories
and files but still the same result.

Can somebody help me in this?

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Stopping heartbeat service on one node lead to restart of resources on other node in cluster

2012-02-07 Thread neha chatrath
 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for Tmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:0 with Rmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:1 with Rmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Tmgr:0 with pimd:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: find_compatible_child: Can't
pair pimd:1 with ms_Tmgr
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: No match
found for pimd:1 (0)
Feb 07 11:06:31 MCG1 pengine: [20534]: info: clone_rsc_order_lh: Inhibiting
pimd:1 from being active
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: native_assign_node: Could not
allocate a node for pimd:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
pimd:0 with Tmgr:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
pimd:1 with Tmgr:1
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Rmgr:0 with mysql:0
Feb 07 11:06:31 MCG1 pengine: [20534]: debug: clone_rsc_order_lh: Pairing
Rmgr:1 with mysql:1
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
Rmgr:0  (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
Rmgr:1  (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
Tmgr:0  (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
Tmgr:1  (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
pimd:0  (Master mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
pimd:1  (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
ClusterIP   (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
EMS:0   (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
EMS:1   (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
NDB_VIP (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
NDB_MGMT(Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Restart resource
mysql:0 (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
mysql:1 (mcg2)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Leave   resource
ndbd:0  (Started mcg1)
Feb 07 11:06:31 MCG1 pengine: [20534]: notice: LogActions: Stopresource
ndbd:1  (mcg2)

*Thanks in advance.

Regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Invoking crm node standby command on active command leads to stopping of resource on both Active and standby node

2012-01-10 Thread neha chatrath
Hello,
I am using a cluster with following configuration:

 [root@MCG1 neha]# crm configure show
node $id=0686a4d1-c9de-4334-8d33-1a9f6f0755dd ggns2mexsatsdp22
node $id=76246d46-f0e4-4ba8-9179-d60aa7c697c8 ggns2mexsatsdp23
node $id=9d59c9e6-24e0-4684-94ab-c07af7e7a2f0 mcg1 \
attributes standby=off
node $id=fb3f06f0-05bf-42ef-a312-c072f589918a mcg2 \
attributes standby=off
primitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
params ip=192.168.113.77 cidr_netmask=255.255.255.0
nic=eth0:1 \
op monitor interval=40 timeout=20
primitive RM ocf:mcg:RM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive Tmgr ocf:mcg:TM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive pimd ocf:mcg:PIMD_RA \
op monitor interval=60 role=Master timeout=30
on-fail=standby \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
ms ms_RM RM \
meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true target-role=Started
ms ms_Tmgr Tmgr \
meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true target-role=Started
ms ms_pimd pimd \
meta master-max=1 master-node-max=1 clone-max=2
clone-node-max=1 notify=true target-role=Started
colocation ip_with_RM inf: ClusterIP ms_RM:Master
colocation ip_with_Tmgr inf: ClusterIP ms_Tmgr:Master
colocation ip_with_pimd inf: ClusterIP ms_pimd:Master
order TM-after-RM inf: ms_RM:promote ms_Tmgr:start
order ip-after-pimd inf: ms_pimd:promote ClusterIP:start
order pimd-after-TM inf: ms_Tmgr:promote ms_pimd:start
property $id=cib-bootstrap-options \
dc-version=1.0.11-55a5f5be61c367cbd676c2f0ec4f1c62b38223d7 \
cluster-infrastructure=Heartbeat \
no-quorum-policy=ignore \
stonith-enabled=false
rsc_defaults $id=rsc-options \
resource-stickiness=100 \
migration-threshold=3

When I execute crm node standby command on the Active node, it leads to
stopping of resourcs on both Active and Standby node.
As per my understanding, this should lead to stopping of resources only on
current Active node and all the resources on the standby node should get a
promote.

Please comment.

Thanks and regards
Neha
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] Stopping heartbeat on active node leads to restart of resources on standby node

2012-01-03 Thread neha chatrath
Hello,
I have a 2 node cluster with following configuration:
*crm configure show
node $id=16738ea4-adae-483f-9d79-b0ecce8050f4 mcg2
primitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
params ip=192.168.113.67 cidr_netmask=255.255.255.0
nic=eth0:1 \
op monitor interval=40 timeout=20
primitive Rmgr ocf:mcg:RM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive Tmgr ocf:mcg:TM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive pimd ocf:mcg:PIMD_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
ms ms_Rmgr Rmgr \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true
ms ms_Tmgr Tmgr \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true
ms ms_pimd pimd \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true target-role=Stopped
colocation ip_with_Rmgr inf: ClusterIP ms_Rmgr:Master
colocation ip_with_Tmgr inf: ClusterIP ms_Tmgr:Master
colocation ip_with_pimd inf: ClusterIP ms_pimd:Master
order TM-after-RM inf: ms_Rmgr:promote ms_Tmgr:start
order ip-after-pimd inf: ms_pimd:promote ClusterIP:start
order pimd-after-TM inf: ms_Tmgr:promote ms_pimd:start
property $id=cib-bootstrap-options \
dc-version=1.0.11-9af47ddebcad19e35a61b2a20301dc038018e8e8 \
cluster-infrastructure=Heartbeat \
no-quorum-policy=ignore \
stonith-enabled=false
rsc_defaults $id=rsc-options \
migration_threshold=3 \
resource-stickiness=100
*With both Acitve and Standby nodes up and running, if I stop Heartbeat on
Active node, all the resources on Standby node,  first receives stop and
then start from Pacemaker.
As per the idea behind clustering, all the master/slave resources on
Standby should simply receive Promote.

Can somebody comment on this behavior?

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[Pacemaker] How to serialize/control resource startup on Standby node

2011-12-28 Thread neha chatrath
Hello,

I have  cluster with 2 nodes with multiple Master/slave resources.
The ordering of resources on the master node is achieved using order option
of crm. When standby node started, the processes are started one after the
another.
Following is the configuration info:
p*rimitive ClusterIP ocf:mcg:MCG_VIPaddr_RA \
params ip=192.168.113.67 cidr_netmask=255.255.255.0
nic=eth0:1 \
op monitor interval=40 timeout=20
primitive Rmgr ocf:mcg:RM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive Tmgr ocf:mcg:TM_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
primitive pimd ocf:mcg:PIMD_RA \
op monitor interval=60 role=Master timeout=30
on-fail=restart \
op monitor interval=40 role=Slave timeout=40 on-fail=restart
ms ms_Rmgr Rmgr \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true
ms ms_Tmgr Tmgr \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true
ms ms_pimd pimd \
meta master-max=1 master-max-node=1 clone-max=2
clone-node-max=1 notify=true
colocation ip_with_Rmgr inf: ClusterIP ms_Rmgr:Master
colocation ip_with_Tmgr inf: ClusterIP ms_Tmgr:Master
colocation ip_with_pimd inf: ClusterIP ms_pimd:Master
order TM-after-RM inf: ms_Rmgr:promote ms_Tmgr:start
order ip-after-pimd inf: ms_pimd:promote ClusterIP:start
order pimd-after-TM inf: ms_Tmgr:promote ms_pimd:start
property $id=cib-bootstrap-options \
dc-version=1.0.11-**db98485d06ed3fe0fe236509f023e1**bd4a5566f1 \
cluster-infrastructure=**Heartbeat \
no-quorum-policy=ignore \
stonith-enabled=false
rsc_defaults $id=rsc-options \
migration_threshold=3 \
resource-stickiness=100

*I have a system requirement in which start of resource (e.g. pimd) is
dependent on successful start of  another resource (e.g. Tmgr)
Everything run smoothly on the master node. This is due to *ordering and
few seconds delay* untill a resource is promoted as Master.
But on the standby node since the resources are started one after the
another without any delay , Standby node in the cluster behaves erratically

Is there a way, through which I can serialize/control resource start up on
the standby node.

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [Pacemaker] Regarding Stonith RAs

2011-11-30 Thread neha chatrath
Hello Andreas,

Pacemaker is not built with Heartbeat support on RHEL-6 and its derivatives.
How do I check this and what steps do I need to take to resolve this issue.

Thanks and regards
Neha Chatrath

On Thu, Nov 24, 2011 at 5:38 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 I could get list of Stontih RAs by installing cman, clvm, ricci,
 pacemaker, rgmanages RPMs provided by CentOS 6 distribution.
 But unfortunately after installing these packages, all the process related
 to Pacemaker are not coming up on starting Heartbeat Deamon.
 When I start Heartbeat daemon, only following process are started:



 Pacemaker is not built with Heartbeat support on RHEL-6 and its
 derivatives.
 root@p init.d]# ps -eaf |grep heartbeat
 root  3522 1  0 17:26 ?00:00:00 heartbeat: master control
 process
 root  3525  3522  0 17:26 ?00:00:00 heartbeat: FIFO
 reader
 root  3526  3522  0 17:26 ?00:00:00 heartbeat: write: bcast
 eth1
 root  3527  3522  0 17:26 ?00:00:00 heartbeat: read: bcast
 eth1
 root  3538  3381  0 17:26 pts/300:00:00 grep heartbeat

 In the log messages, following error logs are observed:
 Nov 24 17:26:19 p heartbeat: [3522]: debug: Signing on API client 3539
 (ccm)
 Nov 24 17:26:19 p ccm: [3539]: info: Hostname: p
 Nov 24 17:26:19 p attrd: [3543]: info: Invoked: /usr/lib/heartbeat/attrd
 Nov 24 17:26:19 p stonith-ng: [3542]: info: Invoked:
 /usr/lib/heartbeat/stonithd
 Nov 24 17:26:19 p cib: [3540]: info: Invoked: /usr/lib/heartbeat/cib
 *Nov 24 17:26:19 p lrmd: [3541]: ERROR: socket_wait_conn_new: trying to
 create in /var/run/heartbeat/lrm_cmd_sock bind:: No such file or directory
 *
 Nov 24 17:26:19 p lrmd: [3541]: ERROR: main: can not create wait
 connection for command.
 Nov 24 17:26:19 p lrmd: [3541]: ERROR: Startup aborted (can't create comm
 channel).  Shutting down.
 Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed /usr/lib/heartbeat/lrmd
 -r process 3541 exited with return code 100.
 Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client /usr/lib/heartbeat/lrmd
 -r exited with return code 100.
 Nov 24 17:26:19 p attrd: [3543]: info: crm_log_init_worker: Changed active
 directory to /var/lib/heartbeat/cores/hacluster
 Nov 24 17:26:19 p attrd: [3543]: info: main: Starting up
 Nov 24 17:26:19 p stonith-ng: [3542]: info: crm_log_init_worker: Changed
 active directory to /var/lib/heartbeat/cores/root
 Nov 24 17:26:19 p cib: [3540]: info: crm_log_init_worker: Changed active
 directory to /var/lib/heartbeat/cores/hacluster
 Nov 24 17:26:19 p attrd: [3543]: CRIT: get_cluster_type: This installation
 of Pacemaker does not support the '(null)' cluster infrastructure.
 Terminating.
 Nov 24 17:26:19 p stonith-ng: [3542]: CRIT: get_cluster_type: This
 installation of Pacemaker does not support the '(null)' cluster
 infrastructure.  Terminating.
 Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed
 /usr/lib/heartbeat/attrd process 3543 exited with return code 100.
 Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client
 /usr/lib/heartbeat/attrd exited with return code 100.
 Nov 24 17:26:19 p heartbeat: [3522]: info: the send queue length from
 heartbeat to client ccm is set to 1024
 Nov 24 17:26:19 p heartbeat: [3522]: WARN: Managed
 /usr/lib/heartbeat/stonithd process 3542 exited with return code 100.
 Nov 24 17:26:19 p heartbeat: [3522]: ERROR: Client
 /usr/lib/heartbeat/stonithd exited with return code 100.
 *Nov 24 17:26:19 p cib: [3540]: info: retrieveCib: Reading cluster
 configuration from: /var/lib/heartbeat/crm/cib.xml (digest:
 /var/lib/heartbeat/crm/cib.xml.sig)*
 Nov 24 17:26:19 p cib: [3540]: debug: log_data_element: readCibXmlFile:
 [on-disk] cib epoch=0 num_updates=0 admin_epoch=0
 validate-with=pacemaker-1.2 cib-last-written=Mon Nov 21 11:09:22 2011 
 ...
 
 Nov 24 17:26:19 p crmd: [3544]: info: crmd_init: Starting crmd
 Nov 24 17:26:19 p crmd: [3544]: debug: s_crmd_fsa: Processing I_STARTUP: [
 state=S_STARTING cause=C_STARTUP origin=crmd_init ]
 Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace://
 A_LOG
 Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace://
 A_STARTUP
 Nov 24 17:26:19 p crmd: [3544]: debug: do_startup: Registering Signal
 Handlers
 Nov 24 17:26:19 p crmd: [3544]: debug: do_startup: Creating CIB and LRM
 objects
 Nov 24 17:26:19 p crmd: [3544]: debug: do_fsa_action: actions:trace://
 A_CIB_START
 Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch:
 Attempting to talk on: /var/run/crm/cib_rw
 Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch:
 Could not init comms on: /var/run/crm/cib_rw
 Nov 24 17:26:19 p crmd: [3544]: debug: cib_native_signon_raw: Connection
 to command channel failed
 Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch:
 Attempting to talk on: /var/run/crm/cib_callback
 Nov 24 17:26:19 p crmd: [3544]: debug: init_client_ipc_comms_nodispatch:
 Could not init comms on: /var

Re: [Pacemaker] Regarding Stonith RAs

2011-11-24 Thread neha chatrath
 cib: [3540]: debug: write_cib_contents: Writing CIB to
disk
...
...
..
Nov 24 17:27:47 p crmd: [3544]: WARN: do_cib_control: Couldn't complete CIB
registration 30 times... pause and retry
Nov 24 17:27:47 p crmd: [3544]: ERROR: do_cib_control: Could not complete
CIB registration  30 times... hard error
Nov 24 17:27:47 p crmd: [3544]: debug: s_crmd_fsa: Processing I_ERROR: [
state=S_STARTING cause=C_FSA_INTERNAL origin=do_cib_control ]
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_ERROR
Nov 24 17:27:47 p crmd: [3544]: ERROR: do_log: FSA: Input I_ERROR from
do_cib_control() received in state S_STARTING
Nov 24 17:27:47 p crmd: [3544]: info: do_state_transition: State transition
S_STARTING - S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=do_cib_control ]
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_DC_TIMER_STOP
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_INTEGRATE_TIMER_STOP
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_FINALIZE_TIMER_STOP
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_RECOVER
Nov 24 17:27:47 p crmd: [3544]: ERROR: do_recover: Action A_RECOVER
(0100) not supported
Nov 24 17:27:47 p crmd: [3544]: debug: do_fsa_action: actions:trace://
A_HA_CONNECT
Nov 24 17:27:47 p crmd: [3544]: CRIT: get_cluster_type: This installation
of Pacemaker does not support the '(null)' cluster infrastructure.
Terminating.
Nov 24 17:27:47 p heartbeat: [3522]: WARN: Managed /usr/lib/heartbeat/crmd
process 3544 exited with return code 100.
Nov 24 17:27:47 p heartbeat: [3522]: ERROR: Client /usr/lib/heartbeat/crmd
exited with return code 100.

 It seems to be reading configuration info from /var/run/heartbeat
directory but actually the info is present in /usr/var/run/heartbeat.

Can somebody suggest me how should I correct that path?
Path environment variable has the following value:
[root@p init.d]# echo $PATH
/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

Thanks and regards
Neha Chatrath

Date: Fri, 18 Nov 2011 10:22:22 +1100
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] Regarding Stonith RAs
Message-ID:
   CAEDLWG2QO+-puhr2qOuvXSCRUcg2gXHE=i=1d3losfn_pcs...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Thu, Nov 17, 2011 at 1:28 AM, Dejan Muhamedagic deja...@fastmail.fm
wrote:
 Hi,

 On Wed, Nov 16, 2011 at 05:49:30PM +0530, neha chatrath wrote:
 [...]
 Nov 14 13:16:57 ggns2mexsatsdp17.hsc.com lrmd: [3976]: notice:
 on_msg_get_rsc_types: can not find this RA class stonith

 The PILS plugin handling stonith resources was not found.
 Strange, cannot recall seeing this before.

Could be a RHEL6 based distro.

 It should be in
 /usr/lib/heartbeat/plugins/RAExec/stonith.so (or /usr/lib64
 depending on your installation). Please check permissions and if
 this file is really a valid so object file. If everything's in
 order no idea what else could be the reason. You could strace
 lrmd on startup and see what happens between lines 1137 and 1158.

 Thanks,

 Dejan


On Mon, Nov 14, 2011 at 2:05 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,
 I am facing issue in configuring a Stonith resource in my system of
 cluster with 2 nodes.
 Whenever I try to give the following command:
 crm configure primitive app_fence stonith::external/ipmi params hostname=
 ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
 passwd=pass@abc123 ,
 I get the following errors:

 ERROR: stonith:external/ipmi: could not parse meta-data:
 Traceback (most recent call last):
   File /usr/sbin/crm, line 41, in module
 crm.main.run()
   File /usr/lib/python2.6/site-packages/crm/main.py, line 249, in run
 if parse_line(levels,shlex.split(' '.join(args))):
   File /usr/lib/python2.6/site-packages/crm/main.py, line 145, in
 parse_line
 lvl.release()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 68, in
 release
 self.droplevel()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 87, in
 droplevel
 self.current_level.end_game(self._in_transit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1524, in end_game
 self.commit(commit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1425, in commit
 self._verify(mkset_obj(xml,changed),mkset_obj(xml))
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1324, in _verify
 rc2 = set_obj_semantic.semantic_check(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 280, in
 semantic_check
 rc = self.__check_unique_clash(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 260, in
 __check_unique_clash
 process_primitive(node, clash_dict)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 245, in
 process_primitive

Re: [Pacemaker] Regarding Stonith RAs

2011-11-16 Thread neha chatrath
Hello,

Looks like a broken installation. I guess that metadata for other
resource classes works fine. It could be some issue with
stonith-ng. Did you notice any messages from stonith-ng?

A Yes, metadata for other resource classes like ocf/heartbeat, ocf/linbit
is working fine.
Problem is seen only with Stonith resource class .

No stonith-ng related errors are visible in the log file.

Following are some excerpts from log file:

Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
Implicit directive: apiauth stonithd uid=root
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=root, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: apiauth stonith-ng   uid=root
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=root, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: apiauth attrduid=hacluster
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=hacluster, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: apiauth crmd uid=hacluster
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=hacluster, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: apiauth pingduid=root
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=root, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn  hacluster /usr/lib/heartbeat/ccm
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive:  hacluster /usr/lib/heartbeat/ccm
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn  hacluster /usr/lib/heartbeat/cib
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive:  hacluster /usr/lib/heartbeat/cib
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn root /usr/lib/heartbeat/lrmd -r
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive: root /usr/lib/heartbeat/lrmd -r
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn root /usr/lib/heartbeat/stonithd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive: root /usr/lib/heartbeat/stonithd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn  hacluster /usr/lib/heartbeat/attrd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive:  hacluster /usr/lib/heartbeat/attrd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug: Implicit
directive: respawn  hacluster /usr/lib/heartbeat/crmd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: info: respawn
directive:  hacluster /usr/lib/heartbeat/crmd
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=hacluster, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=hacluster, gid=null
Nov 14 11:54:30 ggns2mexsatsdp17.hsc.com heartbeat: [3659]: debug:
uid=root, gid=null


...
v 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: info: Starting
child client /usr/lib/heartbeat/lrmd -r (0,0)
Nov 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: info: Starting
child client /usr/lib/heartbeat/stonithd (0,0)
Nov 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3976]: info: Starting
/usr/lib/heartbeat/lrmd -r as uid 0  gid 0 (pid 3976)
Nov 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3977]: info: Starting
/usr/lib/heartbeat/stonithd as uid 0  gid 0 (pid 3977)
Nov 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: info: Starting
child client /usr/lib/heartbeat/attrd (495,489)
Nov 14 11:54:45 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: info: Starting
child client /usr/lib/heartbeat/crmd (495,489)



Nov 14 11:54:46 ggns2mexsatsdp17.hsc.com stonithd: [3977]: debug:
apichan=0x9a36368
Nov 14 11:54:46 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: debug:
create_seq_snapshot_table:no missing packets found for node
ggns2mexsatsdp17.hsc.com
Nov 14 11:54:46 ggns2mexsatsdp17.hsc.com heartbeat: [3661]: debug: Signing
on API client 3975 (cib)
Nov 14 11:54:46 ggns2mexsatsdp17.hsc.com stonithd: [3977]: debug:
callback_chan=0x9a36210
Nov 14 11:54:46 ggns2mexsatsdp17.hsc.com stonithd: [3977]: notice:
/usr/lib/heartbeat/stonithd start up successfully

 

.
Nov 14 13:16:57 ggns2mexsatsdp17.hsc.com lrmd: [3976]: debug:
unregister_client: client lrmadmin [pid:10061] is unregistered
Nov 14 13:16:57 ggns2mexsatsdp17.hsc.com lrmd: [3976]: debug:
on_msg_register:client lrmadmin [10062] registered
Nov 14 13:16:57 ggns2mexsatsdp17.hsc.com lrmd: [3976]: notice:
on_msg_get_rsc_types: can not find this RA class stonith


Thanks and regards
Neha Chatrath

Re: [Pacemaker] Query regarding crm node standby/online command

2011-11-15 Thread neha chatrath
Hello,

Thanks for the reply. Let me rephrase my query regarding interface
monitoring.
I have (say 3 IP interfaces) eth0, eth1, eth2.
Heartbeat is running on eth0.
I can monitor my eth0 link using Heartbeat but is there a possibility of
monitoring eth1 and eth2 interfaces as well using Heartbeat mechanism?
I need this to detect scenarios like if eth0 is working fine (thus, no
break in cluster communication via Heartbeat) but there is some issue with
either eth1 or eth2, I need to raise some alarms etc

Thanks and regards
Neha Chatrath

Message: 4
Date: Tue, 08 Nov 2011 09:45:17 +0100
From: Florian Haas flor...@hastexo.com
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] Query regarding crm node standby/online
   command
Message-ID: 4eb8ec1d.8050...@hastexo.com
Content-Type: text/plain; charset=ISO-8859-1

On 2011-11-08 06:22, neha chatrath wrote:
 Hello,

 I am running Heartbeat and Pacemaker in a cluster with 2 nodes.
 I also have a client registered with Heartbeat daemon for any node/IF
 status changes.

Can you give more details as to the nature of that client?

 When I execute crm node standby command on one of the nodes, there is
 no node status change info reported to the client.
 Is this the expected behavior?

I would say yes, as putting a node in standby mode does not change its
status of being a fully-fledged member of the cluster. It still
participates in all cluster communications, it receives all
configuration changes and status updates. It's merely ineligible for
running any resources. So from the cluster communications layer point of
view (i.e. from Heartbeat's or Corosync's perspective) nothing changes.

 Also, one more query about Heartbeat daemon:
 In my system, I have multiple IP interfaces (each configured with a
 separate IP) with Heartbeat running on one of them.
 I have a requirement of monitoring of all these IP interfaces and
 perform necessary actions (like perform failover etc) in case of any
 interface failure.

Well there is no reason to do this externally. You set up fencing using
an out-of-band fencing method. When cluster communications break down,
you fence one node off the cluster, so resources fail over to the other.

As a word of caution, it seems like you're at least headed into the
direction of reinventing the wheel, and it also seems like you are
trying to implement functionality that's already present in the stack.
(This is just a hunch based on the limited information given, however.)
If that is the case, I would strongly suggest you take a look at
Clusters From Scratch and the Linux-HA User's Guide, and possibly also
Pacemaker: Configuration Explained, to better familiarize yourself with
the functionality of the stack.

Hope this helps.
Cheers,
Florian


On Tue, Nov 8, 2011 at 10:52 AM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 I am running Heartbeat and Pacemaker in a cluster with 2 nodes.
 I also have a client registered with Heartbeat daemon for any node/IF
 status changes.

 When I execute crm node standby command on one of the nodes, there is no
 node status change info reported to the client.
 Is this the expected behavior?

 Also, one more query about Heartbeat daemon:
 In my system, I have multiple IP interfaces (each configured with a
 separate IP) with Heartbeat running on one of them.
 I have a requirement of monitoring of all these IP interfaces and perform
 necessary actions (like perform failover etc) in case of any interface
 failure.
 I am able to monitor the interface on which Heartbeat is running but not
 the rest of them.
 Does Heartbeat allows monitoring of interfaces other than the interfaces
 on which Heartbeat is running?

 Thanks and regards
 Neha Chatrath





-- 
Cheers
Neha Chatrath
  KEEP SMILING
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Regarding Stonith RAs

2011-11-14 Thread neha chatrath
Hello,
I am facing issue in configuring a Stonith resource in my system of cluster
with 2 nodes.
Whenever I try to give the following command:
crm configure primitive app_fence stonith::external/ipmi params hostname=
ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
passwd=pass@abc123 ,
I get the following errors:

ERROR: stonith:external/ipmi: could not parse meta-data:
Traceback (most recent call last):
  File /usr/sbin/crm, line 41, in module
crm.main.run()
  File /usr/lib/python2.6/site-packages/crm/main.py, line 249, in run
if parse_line(levels,shlex.split(' '.join(args))):
  File /usr/lib/python2.6/site-packages/crm/main.py, line 145, in
parse_line
lvl.release()
  File /usr/lib/python2.6/site-packages/crm/levels.py, line 68, in release
self.droplevel()
  File /usr/lib/python2.6/site-packages/crm/levels.py, line 87, in
droplevel
self.current_level.end_game(self._in_transit)
  File /usr/lib/python2.6/site-packages/crm/ui.py, line 1524, in end_game
self.commit(commit)
  File /usr/lib/python2.6/site-packages/crm/ui.py, line 1425, in commit
self._verify(mkset_obj(xml,changed),mkset_obj(xml))
  File /usr/lib/python2.6/site-packages/crm/ui.py, line 1324, in _verify
rc2 = set_obj_semantic.semantic_check(set_obj_all)
  File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 280, in
semantic_check
rc = self.__check_unique_clash(set_obj_all)
  File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 260, in
__check_unique_clash
process_primitive(node, clash_dict)
  File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 245, in
process_primitive
if ra_params[ name ].get(unique) == 1:
TypeError: 'NoneType' object is unsubscriptable

From /var/log/messages: following error is being reported from lrmd: notice:
on_msg_get_metadata: can not find the class stonith

It seems the it is not able to find any RAs related to Stonith.

Following is the output of some crm commands:
*crm(live)ra# classes*
heartbeat
lsb
ocf / heartbeat linbit mcg pacemaker
*stonith*

*crm(live)ra# list ocf heartbeat*
AoEtarget AudibleAlarm  CTDB
ClusterMonDelay Dummy
EvmsSCC   Evmsd Filesystem
ICP   IPaddrIPaddr2
IPsrcaddr IPv6addr  LVM
LinuxSCSI MailToManageRAID
ManageVE  Pure-FTPd Raid1
Route SAPDatabase   SAPInstance
SendArp   ServeRAID SphinxSearchDaemon
Squid Stateful  SysInfo
VIPArip   VirtualDomain WAS
WAS6  WinPopup  Xen
Xinetdanything  apache
conntrackddb2   drbd
eDir88exportfs  fio
iSCSILogicalUnit  iSCSITarget   ids
iscsi jboss ldirectord
mysql mysql-proxy   nfsserver
nginx oracleoralsnr
pgsql pingd portblock
postfix   proftpd   rsyncd
scsi2reservation  sfex  syslog-ng
tomcatvmware

*crm(live)ra# list stonith*

crm(live)ra#

All the sotnith related RAs are present in
/usr/lib/stonith/plugins/external.

Following is the output of ls command :
[root@ggns2mexsatsdp17 ~]# ls /usr/lib/stonith/plugins/external/

drac5  hetzner  ibmrsa ipmi kdumpcheck  nut
output   riloe  ssh  vmware  xen0-ha
dracmc-telnet  hmchttp  ibmrsa-telnet  ippower9258  libvirt ouput
rackpdu  sbdvcenter  xen0

Can somebody please help me with this?

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Regarding Stonith RAs

2011-11-14 Thread neha chatrath
Hello,

I have tried with single : in the crm configure command but there is no
change in the result. LRM and crm are showing the same errors.

Thanks and regards
Neha Chatrath

Date: Mon, 14 Nov 2011 09:49:52 +0100
From: Michael Schwartzkopff mi...@clusterbau.com
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] Regarding Stonith RAs
Message-ID: 20140949.53351.mi...@clusterbau.com
Content-Type: text/plain; charset=iso-8859-1

 Hello,
 I am facing issue in configuring a Stonith resource in my system of
cluster
 with 2 nodes.
 Whenever I try to give the following command:
 crm configure primitive app_fence stonith::external/ipmi params hostname=
 ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
 passwd=pass@abc123 ,

try

crm configure primitive app_fence stonith:external/ipmi (...)

please note the ONE column. Providers are only known in the OCF RA class.


--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 M?nchen

Tel: (0163) 172 50 98


On Mon, Nov 14, 2011 at 2:05 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,
 I am facing issue in configuring a Stonith resource in my system of
 cluster with 2 nodes.
 Whenever I try to give the following command:
 crm configure primitive app_fence stonith::external/ipmi params hostname=
 ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
 passwd=pass@abc123 ,
 I get the following errors:

 ERROR: stonith:external/ipmi: could not parse meta-data:
 Traceback (most recent call last):
   File /usr/sbin/crm, line 41, in module
 crm.main.run()
   File /usr/lib/python2.6/site-packages/crm/main.py, line 249, in run
 if parse_line(levels,shlex.split(' '.join(args))):
   File /usr/lib/python2.6/site-packages/crm/main.py, line 145, in
 parse_line
 lvl.release()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 68, in
 release
 self.droplevel()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 87, in
 droplevel
 self.current_level.end_game(self._in_transit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1524, in end_game
 self.commit(commit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1425, in commit
 self._verify(mkset_obj(xml,changed),mkset_obj(xml))
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1324, in _verify
 rc2 = set_obj_semantic.semantic_check(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 280, in
 semantic_check
 rc = self.__check_unique_clash(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 260, in
 __check_unique_clash
 process_primitive(node, clash_dict)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 245, in
 process_primitive
 if ra_params[ name ].get(unique) == 1:
 TypeError: 'NoneType' object is unsubscriptable
 
 From /var/log/messages: following error is being reported from lrmd: notice:
 on_msg_get_metadata: can not find the class stonith

 It seems the it is not able to find any RAs related to Stonith.

 Following is the output of some crm commands:
 *crm(live)ra# classes*
 heartbeat
 lsb
 ocf / heartbeat linbit mcg pacemaker
 *stonith*

 *crm(live)ra# list ocf heartbeat*
 AoEtarget AudibleAlarm  CTDB
 ClusterMonDelay Dummy
 EvmsSCC   Evmsd Filesystem
 ICP   IPaddrIPaddr2
 IPsrcaddr IPv6addr  LVM
 LinuxSCSI MailToManageRAID
 ManageVE  Pure-FTPd Raid1
 Route SAPDatabase   SAPInstance
 SendArp   ServeRAID SphinxSearchDaemon
 Squid Stateful  SysInfo
 VIPArip   VirtualDomain WAS
 WAS6  WinPopup  Xen
 Xinetdanything  apache
 conntrackddb2   drbd
 eDir88exportfs  fio
 iSCSILogicalUnit  iSCSITarget   ids
 iscsi jboss ldirectord
 mysql mysql-proxy   nfsserver
 nginx oracleoralsnr
 pgsql pingd portblock
 postfix   proftpd   rsyncd
 scsi2reservation  sfex  syslog-ng
 tomcatvmware

 *crm(live)ra# list stonith*

 crm(live)ra#

 All the sotnith related RAs are present in
 /usr/lib/stonith/plugins/external.

 Following is the output of ls command :
 [root@ggns2mexsatsdp17 ~]# ls /usr/lib/stonith/plugins/external/

 drac5  hetzner  ibmrsa ipmi kdumpcheck  nut
 output   riloe  ssh  vmware  xen0-ha
 dracmc-telnet  hmchttp  ibmrsa-telnet  ippower9258  libvirt ouput
 rackpdu  sbdvcenter  xen0

 Can somebody please help me with this?

 Thanks and regards
 Neha Chatrath

Re: [Pacemaker] Regarding Stonith RAs

2011-11-14 Thread neha chatrath
Hello Dejan,

I am using Cluster Glue version 1.0.7.
Also this does not seem to be a problem with a specific Stonith agent like
IPMI, I think it is more of an issue with all the Stonith agents.
I have tried configuring another test Stonith agent e.g. Suicide and I am
facing exactly the same issue.

Kindly please suggest.

Thanks and regards
Neha Chatrath

Date: Mon, 14 Nov 2011 15:41:43 +0100
From: Dejan Muhamedagic deja...@fastmail.fm
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] Regarding Stonith RAs
Message-ID: 2014144142.GA3735@squib
Content-Type: text/plain; charset=us-ascii

Hi,

On Mon, Nov 14, 2011 at 02:05:49PM +0530, neha chatrath wrote:
 Hello,
 I am facing issue in configuring a Stonith resource in my system of
cluster
 with 2 nodes.
 Whenever I try to give the following command:
 crm configure primitive app_fence stonith::external/ipmi params hostname=
 ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
 passwd=pass@abc123 ,
 I get the following errors:

 ERROR: stonith:external/ipmi: could not parse meta-data:

Which version of cluster-glue do you have installed? There is a
serious issue with external/ipmi in version 1.0.8, we'll make a
new release ASAP.

Thanks,

Dejan


On Mon, Nov 14, 2011 at 2:05 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,
 I am facing issue in configuring a Stonith resource in my system of
 cluster with 2 nodes.
 Whenever I try to give the following command:
 crm configure primitive app_fence stonith::external/ipmi params hostname=
 ggns2mexsatsdp17.hsc.com ipaddr=192.168.113.17 userid=root
 passwd=pass@abc123 ,
 I get the following errors:

 ERROR: stonith:external/ipmi: could not parse meta-data:
 Traceback (most recent call last):
   File /usr/sbin/crm, line 41, in module
 crm.main.run()
   File /usr/lib/python2.6/site-packages/crm/main.py, line 249, in run
 if parse_line(levels,shlex.split(' '.join(args))):
   File /usr/lib/python2.6/site-packages/crm/main.py, line 145, in
 parse_line
 lvl.release()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 68, in
 release
 self.droplevel()
   File /usr/lib/python2.6/site-packages/crm/levels.py, line 87, in
 droplevel
 self.current_level.end_game(self._in_transit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1524, in end_game
 self.commit(commit)
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1425, in commit
 self._verify(mkset_obj(xml,changed),mkset_obj(xml))
   File /usr/lib/python2.6/site-packages/crm/ui.py, line 1324, in _verify
 rc2 = set_obj_semantic.semantic_check(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 280, in
 semantic_check
 rc = self.__check_unique_clash(set_obj_all)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 260, in
 __check_unique_clash
 process_primitive(node, clash_dict)
   File /usr/lib/python2.6/site-packages/crm/cibconfig.py, line 245, in
 process_primitive
 if ra_params[ name ].get(unique) == 1:
 TypeError: 'NoneType' object is unsubscriptable
 
 From /var/log/messages: following error is being reported from lrmd: notice:
 on_msg_get_metadata: can not find the class stonith

 It seems the it is not able to find any RAs related to Stonith.

 Following is the output of some crm commands:
 *crm(live)ra# classes*
 heartbeat
 lsb
 ocf / heartbeat linbit mcg pacemaker
 *stonith*

 *crm(live)ra# list ocf heartbeat*
 AoEtarget AudibleAlarm  CTDB
 ClusterMonDelay Dummy
 EvmsSCC   Evmsd Filesystem
 ICP   IPaddrIPaddr2
 IPsrcaddr IPv6addr  LVM
 LinuxSCSI MailToManageRAID
 ManageVE  Pure-FTPd Raid1
 Route SAPDatabase   SAPInstance
 SendArp   ServeRAID SphinxSearchDaemon
 Squid Stateful  SysInfo
 VIPArip   VirtualDomain WAS
 WAS6  WinPopup  Xen
 Xinetdanything  apache
 conntrackddb2   drbd
 eDir88exportfs  fio
 iSCSILogicalUnit  iSCSITarget   ids
 iscsi jboss ldirectord
 mysql mysql-proxy   nfsserver
 nginx oracleoralsnr
 pgsql pingd portblock
 postfix   proftpd   rsyncd
 scsi2reservation  sfex  syslog-ng
 tomcatvmware

 *crm(live)ra# list stonith*

 crm(live)ra#

 All the sotnith related RAs are present in
 /usr/lib/stonith/plugins/external.

 Following is the output of ls command :
 [root@ggns2mexsatsdp17 ~]# ls /usr/lib/stonith/plugins/external/

 drac5  hetzner  ibmrsa ipmi kdumpcheck  nut
 output

[Pacemaker] Query regarding crm node standby/online command

2011-11-07 Thread neha chatrath
Hello,

I am running Heartbeat and Pacemaker in a cluster with 2 nodes.
I also have a client registered with Heartbeat daemon for any node/IF
status changes.

When I execute crm node standby command on one of the nodes, there is no
node status change info reported to the client.
Is this the expected behavior?

Also, one more query about Heartbeat daemon:
In my system, I have multiple IP interfaces (each configured with a
separate IP) with Heartbeat running on one of them.
I have a requirement of monitoring of all these IP interfaces and perform
necessary actions (like perform failover etc) in case of any interface
failure.
I am able to monitor the interface on which Heartbeat is running but not
the rest of them.
Does Heartbeat allows monitoring of interfaces other than the interfaces on
which Heartbeat is running?

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Inter-cluster communication using Heartbeat and Pacemaker

2011-11-02 Thread neha chatrath
Hello Andreas,

There is a system requirement according to which:
1. There are 2 independent clusters : 1 for data plane 2. for control plane
2. These clusters are connected to each other through IP/Ethernet
connectivity for transmission and reception of control plane signalling
only i.e. user plane traffic does not go through control plane cluster.
3. Nodes in the control plane cluster needs to know the status of the nodes
in data plane to apply e.g. load balancing algorithms at its end.
Thus, inter cluster communication is required.

Thanks and regards
Neha Chatrath

Date: Tue, 1 Nov 2011 21:27:51 +1100
From: Andrew Beekhof and...@beekhof.net
To: The Pacemaker cluster resource manager
   pacemaker@oss.clusterlabs.org

Subject: Re: [Pacemaker] Inter-cluster communication using Heartbeat
   and Pacemaker
Message-ID:
   CAEDLWG0ZAfUs_a=yje2pauixx5tita55xb+hu2ywngmkuev...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

On Fri, Oct 28, 2011 at 8:41 PM, neha chatrath nehachatr...@gmail.com
wrote:
 Hello,

 Is there a way to do inter-cluster communication using Heartbeat/Pacemaker
 framework?

Well by definition if the two nodes can talk to each other they're
part of the same cluster.
What are you trying to achieve?


 Thanks and regards
 Neha Chatrath



On Fri, Oct 28, 2011 at 3:11 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 Is there a way to do inter-cluster communication using Heartbeat/Pacemaker
 framework?

 Thanks and regards
 Neha Chatrath


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


[Pacemaker] Inter-cluster communication using Heartbeat and Pacemaker

2011-10-28 Thread neha chatrath
Hello,

Is there a way to do inter-cluster communication using Heartbeat/Pacemaker
framework?

Thanks and regards
Neha Chatrath
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] Problem in Stonith configuration

2011-10-28 Thread neha chatrath
Hello,

1. How about using Integrated ILO device for fencing? I am using HP Proliant
DL360 G7 server which supports ILO3.
   - Can RILOE Stonith be used for this?

2. Can meatware Stonith plugin be used for production software?

3. One more issue which I am facing is that when I try
  -crm ra list stonith command, there is no output. although
different RA's under Heartbeat class are visible.
  - Also, Stonith class is visible in the output of crm ra
classes command
  - all the default Stonith RA's  like meatware, suicide, ibmrsa,
ipmi etc are present in /usr/lib/stonith/plugins directory.
  - Due to this I am not able to configure stonith in my system.

Thanks and regards
Neha Chatrath

On Tue, Oct 18, 2011 at 2:51 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

  1. If a resource fails, node should reboot (through fencing mechanism)
  and resources should re-start on the node.

 Why would you want that? This would increase the service downtime
 considerable. Why is a local restart not possible ... and even if there
 is a good reason for a reboot, why not starting the resource on the
 other node?
 -In our system, there are some primitive, clone resources along with 3
 different master-slave resources.
 -All the masters and slaves of these resources are co-located i.e. all the
 3 masters are co-located on a node and 3 slaves on the other node.
 -These 3 master-slaves resources are tightly coupled. There is a
 requirement that failure of even any one of these resources, restarts all
 the resources in the group
 -All these resources can be shifted to the other node but subsequently
 these should also be restarted as a lot of data/control plane synching is
 being done between the two nodes.
 e.g. If one of the resources running on node1 as a Master fails, then all
 these 3 resources are shifted to the other node i.e. node2  (with
 corresponding slave resources being promoted as master). On  node1, these
 resources should get re-started as slaves.

 We understand that node restart will increase the downtime but since we
 could not find much on the option for group restart of master-slave
 resources, we are trying for node restart option.


 Thanks and regards
 Neha Chatrath

 -- Forwarded message --
 From: Andreas Kurz andr...@hastexo.com
 Date: Tue, Oct 18, 2011 at 1:55 PM
 Subject: Re: [Pacemaker] Problem in Stonith configuration
 To: pacemaker@oss.clusterlabs.org


 Hello,


 On 10/18/2011 09:00 AM, neha chatrath wrote:
  Hello,
 
  Minor updates in the first requirement.
  1. If a resource fails, node should reboot (through fencing mechanism)
  and resources should re-start on the node.

 Why would you want that? This would increase the service downtime
 considerable. Why is a local restart not possible ... and even if there
 is a good reason for a reboot, why not starting the resource on the
 other node?


  2. If the physical link between the nodes in a cluster fails then that
  node should be isolated (kind of a power down) and the resources should
  continue to run on the other nodes

 That is how stonith works, yes.

 crm ra list stonith ... gives you a list of all available stonith plugins.

 crm ra info stonit: ... details for a specific plugin.

 Using external/ipmi is often a good choice because a lot of servers
 already have an BMC with IPMI on board or they are shipped with a
 management card supporting IMPI.

 Regards,
 Andreas


 On Tue, Oct 18, 2011 at 12:30 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 Minor updates in the first requirement.
 1. If a resource fails, node should reboot (through fencing mechanism) and
 resources should re-start on the node.

 2. If the physical link between the nodes in a cluster fails then that
 node should be isolated (kind of a power down) and the resources should
 continue to run on the other nodes

 Apologies for the inconvenience.


 Thanks and regards
 Neha Chatrath

 On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath 
 nehachatr...@gmail.comwrote:

 Hello Andreas,

 Thanks for the reply.

 So can you please suggest what Stonith plugin should I use for the
 production release of my software. I have the following system requirements:
 1. If a node in the cluster fails, it should be reboot and resources
 should re-start on the node.
 2. If the physical link between the nodes in a cluster fails then that
 node should be isolated (kind of a power down) and the resources should
 continue to run on the other nodes.

 I have different types of resources e.g. primitive, master-slave and cone
 running on my system.

 Thanks and regards
 Neha Chatrath


 Date: Mon, 17 Oct 2011 15:08:16 +0200
 From: Andreas Kurz andr...@hastexo.com
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] Problem in Stonith configuration
 Message-ID: 4e9c28c0.8070...@hastexo.com
 Content-Type: text/plain; charset=iso-8859-1

 Hello,


 On 10/17/2011 12:34 PM, neha chatrath wrote:
  Hello,
  I am configuring

Re: [Pacemaker] Problem in Stonith configuration

2011-10-18 Thread neha chatrath
Hello Andreas,

Thanks for the reply.

So can you please suggest what Stonith plugin should I use for the
production release of my software. I have the following system requirements:
1. If a node in the cluster fails, it should be reboot and resources should
re-start on the node.
2. If the physical link between the nodes in a cluster fails then that node
should be isolated (kind of a power down) and the resources should continue
to run on the other nodes.

I have different types of resources e.g. primitive, master-slave and cone
running on my system.

Thanks and regards
Neha Chatrath


Date: Mon, 17 Oct 2011 15:08:16 +0200
From: Andreas Kurz andr...@hastexo.com
To: pacemaker@oss.clusterlabs.org
Subject: Re: [Pacemaker] Problem in Stonith configuration
Message-ID: 4e9c28c0.8070...@hastexo.com
Content-Type: text/plain; charset=iso-8859-1

Hello,

On 10/17/2011 12:34 PM, neha chatrath wrote:
 Hello,
 I am configuring a 2 node cluster with following configuration:

 *[root@MCG1 init.d]# crm configure show

 node $id=16738ea4-adae-483f-9d79-
b0ecce8050f4 mcg2 \
 attributes standby=off

 node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
 attributes standby=off

 primitive ClusterIP ocf:heartbeat:IPaddr \
 params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \

 op monitor interval=40s timeout=20s \
 meta target-role=Started

 primitive app1_fencing stonith:suicide \
 op monitor interval=90 \
 meta target-role=Started

 primitive myapp1 ocf:heartbeat:Redundancy \
 op monitor interval=60s role=Master timeout=30s on-fail=standby \
 op monitor interval=40s role=Slave timeout=40s on-fail=restart

 primitive myapp2 ocf:mcg:Redundancy_myapp2 \
 op monitor interval=60 role=Master timeout=30 on-fail=standby \
 op monitor interval=40 role=Slave timeout=40 on-fail=restart

 primitive myapp3 ocf:mcg:red_app3 \
 op monitor interval=60 role=Master timeout=30 on-fail=fence \
 op monitor interval=40 role=Slave timeout=40 on-fail=restart

 ms ms_myapp1 myapp1 \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
 notify=true

 ms ms_myapp2 myapp2 \
 meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
 notify=true

 ms ms_myapp3 myapp3 \
 meta master-max=1 master-max-node=1 clone-max=2 clone-node-max=1
 notify=true

 colocation myapp1_col inf: ClusterIP ms_myapp1:Master

 colocation myapp2_col inf: ClusterIP ms_myapp2:Master

 colocation myapp3_col inf: ClusterIP ms_myapp3:Master

 order myapp1_order inf: ms_myapp1:promote ClusterIP:start

 order myapp2_order inf: ms_myapp2:promote ms_myapp1:start

 order myapp3_order inf: ms_myapp3:promote ms_myapp2:start

 property $id=cib-bootstrap-options \
 dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
 cluster-infrastructure=Heartbeat \
 stonith-enabled=true \
 no-quorum-policy=ignore

 rsc_defaults $id=rsc-options \
 resource-stickiness=100 \
 migration-threshold=3
 *
 I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
 resources (myapp, myapp1 etc) gets started even on this node.
 Following is the output of *crm_mon -f * command:

 *Last updated: Mon Oct 17 10:19:22 2011
 Stack: Heartbeat
 Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
 quorum
 Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
 2 Nodes configured, unknown expected votes
 5 Resources configured.
 
 Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)

The cluster is waiting for a successful fencing event before starting
all resources .. the only way to be sure the second node runs no resources.

Since you are using suicide pluging this will never happen if Heartbeat
is not started on that node. If this is only a _test_setup_ go with ssh
or even null stonith plugin ... never use them on production systems!

Regards,
Andreas


On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,
 I am configuring a 2 node cluster with following configuration:

 *[root@MCG1 init.d]# crm configure show

 node $id=16738ea4-adae-483f-9d79-b0ecce8050f4 mcg2 \
 attributes standby=off

 node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
 attributes standby=off

 primitive ClusterIP ocf:heartbeat:IPaddr \
 params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \

 op monitor interval=40s timeout=20s \
 meta target-role=Started

 primitive app1_fencing stonith:suicide \
 op monitor interval=90 \
 meta target-role=Started

 primitive myapp1 ocf:heartbeat:Redundancy \
 op monitor interval=60s role=Master timeout=30s on-fail=standby \
 op monitor interval=40s role=Slave timeout=40s on-fail=restart

 primitive myapp2 ocf:mcg:Redundancy_myapp2 \
 op monitor interval=60 role=Master timeout=30 on-fail=standby \
 op monitor interval=40 role=Slave timeout=40 on-fail=restart

 primitive myapp3 ocf:mcg:red_app3 \
 op monitor interval=60 role=Master timeout=30 on-fail=fence \
 op monitor interval=40 role=Slave timeout=40 on-fail=restart

 ms ms_myapp1 myapp1 \
 meta master-max=1 master

Re: [Pacemaker] Problem in Stonith configuration

2011-10-18 Thread neha chatrath
Hello,

Minor updates in the first requirement.
1. If a resource fails, node should reboot (through fencing mechanism) and
resources should re-start on the node.
2. If the physical link between the nodes in a cluster fails then that node
should be isolated (kind of a power down) and the resources should continue
to run on the other nodes

Apologies for the inconvenience.

Thanks and regards
Neha Chatrath

On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello Andreas,

 Thanks for the reply.

 So can you please suggest what Stonith plugin should I use for the
 production release of my software. I have the following system requirements:
 1. If a node in the cluster fails, it should be reboot and resources should
 re-start on the node.
 2. If the physical link between the nodes in a cluster fails then that node
 should be isolated (kind of a power down) and the resources should continue
 to run on the other nodes.

 I have different types of resources e.g. primitive, master-slave and cone
 running on my system.

 Thanks and regards
 Neha Chatrath


 Date: Mon, 17 Oct 2011 15:08:16 +0200
 From: Andreas Kurz andr...@hastexo.com
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] Problem in Stonith configuration
 Message-ID: 4e9c28c0.8070...@hastexo.com
 Content-Type: text/plain; charset=iso-8859-1

 Hello,


 On 10/17/2011 12:34 PM, neha chatrath wrote:
  Hello,
  I am configuring a 2 node cluster with following configuration:
 
  *[root@MCG1 init.d]# crm configure show
 
  node $id=16738ea4-adae-483f-9d79-
 b0ecce8050f4 mcg2 \
  attributes standby=off
 
  node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
  attributes standby=off
 
  primitive ClusterIP ocf:heartbeat:IPaddr \
  params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \
 
  op monitor interval=40s timeout=20s \
  meta target-role=Started
 
  primitive app1_fencing stonith:suicide \
  op monitor interval=90 \
  meta target-role=Started
 
  primitive myapp1 ocf:heartbeat:Redundancy \
  op monitor interval=60s role=Master timeout=30s on-fail=standby \
  op monitor interval=40s role=Slave timeout=40s on-fail=restart
 
  primitive myapp2 ocf:mcg:Redundancy_myapp2 \
  op monitor interval=60 role=Master timeout=30 on-fail=standby \
  op monitor interval=40 role=Slave timeout=40 on-fail=restart
 
  primitive myapp3 ocf:mcg:red_app3 \
  op monitor interval=60 role=Master timeout=30 on-fail=fence \
  op monitor interval=40 role=Slave timeout=40 on-fail=restart
 
  ms ms_myapp1 myapp1 \
  meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
  notify=true
 
  ms ms_myapp2 myapp2 \
  meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
  notify=true
 
  ms ms_myapp3 myapp3 \
  meta master-max=1 master-max-node=1 clone-max=2 clone-node-max=1
  notify=true
 
  colocation myapp1_col inf: ClusterIP ms_myapp1:Master
 
  colocation myapp2_col inf: ClusterIP ms_myapp2:Master
 
  colocation myapp3_col inf: ClusterIP ms_myapp3:Master
 
  order myapp1_order inf: ms_myapp1:promote ClusterIP:start
 
  order myapp2_order inf: ms_myapp2:promote ms_myapp1:start
 
  order myapp3_order inf: ms_myapp3:promote ms_myapp2:start
 
  property $id=cib-bootstrap-options \
  dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
  cluster-infrastructure=Heartbeat \
  stonith-enabled=true \
  no-quorum-policy=ignore
 
  rsc_defaults $id=rsc-options \
  resource-stickiness=100 \
  migration-threshold=3
  *

  I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
  resources (myapp, myapp1 etc) gets started even on this node.
  Following is the output of *crm_mon -f * command:
 
  *Last updated: Mon Oct 17 10:19:22 2011

  Stack: Heartbeat
  Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
  quorum
  Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
  2 Nodes configured, unknown expected votes
  5 Resources configured.
  
  Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)

 The cluster is waiting for a successful fencing event before starting
 all resources .. the only way to be sure the second node runs no resources.

 Since you are using suicide pluging this will never happen if Heartbeat
 is not started on that node. If this is only a _test_setup_ go with ssh
 or even null stonith plugin ... never use them on production systems!

 Regards,
 Andreas


 On Mon, Oct 17, 2011 at 4:04 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,
 I am configuring a 2 node cluster with following configuration:

 *[root@MCG1 init.d]# crm configure show

 node $id=16738ea4-adae-483f-9d79-b0ecce8050f4 mcg2 \
 attributes standby=off

 node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
 attributes standby=off

 primitive ClusterIP ocf:heartbeat:IPaddr \
 params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \

 op monitor interval=40s timeout=20s \
 meta target-role=Started

 primitive app1_fencing stonith:suicide \
 op monitor

Re: [Pacemaker] Problem in Stonith configuration

2011-10-18 Thread neha chatrath
Hello,
 1. If a resource fails, node should reboot (through fencing mechanism)
 and resources should re-start on the node.

Why would you want that? This would increase the service downtime
considerable. Why is a local restart not possible ... and even if there
is a good reason for a reboot, why not starting the resource on the
other node?
-In our system, there are some primitive, clone resources along with 3
different master-slave resources.
-All the masters and slaves of these resources are co-located i.e. all the 3
masters are co-located on a node and 3 slaves on the other node.
-These 3 master-slaves resources are tightly coupled. There is a requirement
that failure of even any one of these resources, restarts all the resources
in the group
-All these resources can be shifted to the other node but subsequently these
should also be restarted as a lot of data/control plane synching is being
done between the two nodes.
e.g. If one of the resources running on node1 as a Master fails, then all
these 3 resources are shifted to the other node i.e. node2  (with
corresponding slave resources being promoted as master). On  node1, these
resources should get re-started as slaves.

We understand that node restart will increase the downtime but since we
could not find much on the option for group restart of master-slave
resources, we are trying for node restart option.

Thanks and regards
Neha Chatrath

-- Forwarded message --
From: Andreas Kurz andr...@hastexo.com
Date: Tue, Oct 18, 2011 at 1:55 PM
Subject: Re: [Pacemaker] Problem in Stonith configuration
To: pacemaker@oss.clusterlabs.org


Hello,

On 10/18/2011 09:00 AM, neha chatrath wrote:
 Hello,

 Minor updates in the first requirement.
 1. If a resource fails, node should reboot (through fencing mechanism)
 and resources should re-start on the node.

Why would you want that? This would increase the service downtime
considerable. Why is a local restart not possible ... and even if there
is a good reason for a reboot, why not starting the resource on the
other node?

 2. If the physical link between the nodes in a cluster fails then that
 node should be isolated (kind of a power down) and the resources should
 continue to run on the other nodes

That is how stonith works, yes.

crm ra list stonith ... gives you a list of all available stonith plugins.

crm ra info stonit: ... details for a specific plugin.

Using external/ipmi is often a good choice because a lot of servers
already have an BMC with IPMI on board or they are shipped with a
management card supporting IMPI.

Regards,
Andreas


On Tue, Oct 18, 2011 at 12:30 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello,

 Minor updates in the first requirement.
 1. If a resource fails, node should reboot (through fencing mechanism) and
 resources should re-start on the node.

 2. If the physical link between the nodes in a cluster fails then that node
 should be isolated (kind of a power down) and the resources should continue
 to run on the other nodes

 Apologies for the inconvenience.


 Thanks and regards
 Neha Chatrath

 On Tue, Oct 18, 2011 at 12:08 PM, neha chatrath nehachatr...@gmail.comwrote:

 Hello Andreas,

 Thanks for the reply.

 So can you please suggest what Stonith plugin should I use for the
 production release of my software. I have the following system requirements:
 1. If a node in the cluster fails, it should be reboot and resources
 should re-start on the node.
 2. If the physical link between the nodes in a cluster fails then that
 node should be isolated (kind of a power down) and the resources should
 continue to run on the other nodes.

 I have different types of resources e.g. primitive, master-slave and cone
 running on my system.

 Thanks and regards
 Neha Chatrath


 Date: Mon, 17 Oct 2011 15:08:16 +0200
 From: Andreas Kurz andr...@hastexo.com
 To: pacemaker@oss.clusterlabs.org
 Subject: Re: [Pacemaker] Problem in Stonith configuration
 Message-ID: 4e9c28c0.8070...@hastexo.com
 Content-Type: text/plain; charset=iso-8859-1

 Hello,


 On 10/17/2011 12:34 PM, neha chatrath wrote:
  Hello,
  I am configuring a 2 node cluster with following configuration:
 
  *[root@MCG1 init.d]# crm configure show
 
  node $id=16738ea4-adae-483f-9d79-
 b0ecce8050f4 mcg2 \
  attributes standby=off
 
  node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
  attributes standby=off
 
  primitive ClusterIP ocf:heartbeat:IPaddr \
  params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \
 
  op monitor interval=40s timeout=20s \
  meta target-role=Started
 
  primitive app1_fencing stonith:suicide \
  op monitor interval=90 \
  meta target-role=Started
 
  primitive myapp1 ocf:heartbeat:Redundancy \
  op monitor interval=60s role=Master timeout=30s on-fail=standby
 \
  op monitor interval=40s role=Slave timeout=40s on-fail=restart
 
  primitive myapp2 ocf:mcg:Redundancy_myapp2 \
  op monitor interval=60 role=Master timeout=30 on-fail=standby \
  op monitor interval=40 role

[Pacemaker] Problem in Stonith configuration

2011-10-17 Thread neha chatrath
Hello,
I am configuring a 2 node cluster with following configuration:

*[root@MCG1 init.d]# crm configure show

node $id=16738ea4-adae-483f-9d79-b0ecce8050f4 mcg2 \
attributes standby=off

node $id=3d507250-780f-414a-b674-8c8d84e345cd mcg1 \
attributes standby=off

primitive ClusterIP ocf:heartbeat:IPaddr \
params ip=192.168.1.204 cidr_netmask=255.255.255.0 nic=eth0:1 \

op monitor interval=40s timeout=20s \
meta target-role=Started

primitive app1_fencing stonith:suicide \
op monitor interval=90 \
meta target-role=Started

primitive myapp1 ocf:heartbeat:Redundancy \
op monitor interval=60s role=Master timeout=30s on-fail=standby \
op monitor interval=40s role=Slave timeout=40s on-fail=restart

primitive myapp2 ocf:mcg:Redundancy_myapp2 \
op monitor interval=60 role=Master timeout=30 on-fail=standby \
op monitor interval=40 role=Slave timeout=40 on-fail=restart

primitive myapp3 ocf:mcg:red_app3 \
op monitor interval=60 role=Master timeout=30 on-fail=fence \
op monitor interval=40 role=Slave timeout=40 on-fail=restart

ms ms_myapp1 myapp1 \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true

ms ms_myapp2 myapp2 \
meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1
notify=true

ms ms_myapp3 myapp3 \
meta master-max=1 master-max-node=1 clone-max=2 clone-node-max=1
notify=true

colocation myapp1_col inf: ClusterIP ms_myapp1:Master

colocation myapp2_col inf: ClusterIP ms_myapp2:Master

colocation myapp3_col inf: ClusterIP ms_myapp3:Master

order myapp1_order inf: ms_myapp1:promote ClusterIP:start

order myapp2_order inf: ms_myapp2:promote ms_myapp1:start

order myapp3_order inf: ms_myapp3:promote ms_myapp2:start

property $id=cib-bootstrap-options \
dc-version=1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1 \
cluster-infrastructure=Heartbeat \
stonith-enabled=true \
no-quorum-policy=ignore

rsc_defaults $id=rsc-options \
resource-stickiness=100 \
migration-threshold=3
*
I start Heartbeat demon only one of the nodes e.g. mcg1. But none of the
resources (myapp, myapp1 etc) gets started even on this node.
Following is the output of *crm_mon -f * command:

*Last updated: Mon Oct 17 10:19:22 2011
Stack: Heartbeat
Current DC: mcg1 (3d507250-780f-414a-b674-8c8d84e345cd)- partition with
quorum
Version: 1.0.11-db98485d06ed3fe0fe236509f023e1bd4a5566f1
2 Nodes configured, unknown expected votes
5 Resources configured.

Node mcg2 (16738ea4-adae-483f-9d79-b0ecce8050f4): UNCLEAN (offline)
Online: [ mcg1 ]
app1_fencing(stonith:suicide):Started mcg1

Migration summary:
* Node mcg1:
*
When I set stonith_enabled as false, then all my resources comes up.

Can somebody help me with STONITH configuration?

Cheers
Neha Chatrath
  KEEP SMILING
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker