Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-07 Thread Jerome Yanga
I would just want to share that the command recommended did NOT move
the resource to another node.  It basically clears the Failed Actions
section.

Thanks again, Bill.

Regards,
j

On Tue, Mar 6, 2012 at 11:46 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 2:38 PM, Jerome Yanga wrote:

 Do you know by chance if that command you have provided bounces the resource?

 I don't know what you mean by bounce the resource. According to:

 http://www.clusterlabs.org/doc/crm_cli.html

 the command refreshes the resource status. Depending on your configuration, it
 might shift a resource to another node.

 But I am not an expert! I merely knew how to clear up the error message.

 On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
 � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK. �However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread Jerome Yanga
crm_mon shows the error below.

Failed actions:
drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
rc=-2, status=Timed Out): unknown exec error

I have check DRBD and the mirror is connected and uptodate on both nodes.

The error above caused the resources to failover and it seems to be
working OK.  However, the failed actions section has not disappeared.
How do I clear this error?

Regards,
j
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread Jerome Yanga
Thanks, Bill.

Do you know by chance if that command you have provided bounces the resource?

Regards,
j

On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
     drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK.  However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How do I clear the Failed actions section?

2012-03-06 Thread Jerome Yanga
Understood.  Thanks again, Bill.

Regards,
j

On Tue, Mar 6, 2012 at 11:46 AM, William Seligman
selig...@nevis.columbia.edu wrote:
 On 3/6/12 2:38 PM, Jerome Yanga wrote:

 Do you know by chance if that command you have provided bounces the resource?

 I don't know what you mean by bounce the resource. According to:

 http://www.clusterlabs.org/doc/crm_cli.html

 the command refreshes the resource status. Depending on your configuration, it
 might shift a resource to another node.

 But I am not an expert! I merely knew how to clear up the error message.

 On Tue, Mar 6, 2012 at 10:28 AM, William Seligman
 selig...@nevis.columbia.edu wrote:
 On 3/6/12 1:04 PM, Jerome Yanga wrote:
 crm_mon shows the error below.

 Failed actions:
 � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132,
 rc=-2, status=Timed Out): unknown exec error

 I have check DRBD and the mirror is connected and uptodate on both nodes.

 The error above caused the resources to failover and it seems to be
 working OK. �However, the failed actions section has not disappeared.
 How do I clear this error?

 crm resource cleanup drbd0

 --
 Bill Seligman             | Phone: (914) 591-2823
 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu
 PO Box 137                |
 Irvington NY 10533 USA    | http://www.nevis.columbia.edu/~seligman/


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild

2009-05-13 Thread Jerome Yanga
Thanks, Andrew.

FYI, it seems that the crm(live) takes care of this.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof
Sent: Tuesday, May 12, 2009 6:43 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild

try using crm_uuid to write back the value Rubric thinks Normen should have,
i forget the name of the file on rubric that contains it.  hostcache
or something like that

On Mon, May 11, 2009 at 6:18 PM, Jerome Yanga jya...@esri.com wrote:
 Andrew,

 Will I be able change the UUID saved in Rubric so that it would reflect the 
 new one from the rebuilt Nomen?  If so, which file(s) do I need to modify?

 Thank you.

 jerome

 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof
 Sent: Monday, May 11, 2009 12:45 AM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild

 On Sat, May 9, 2009 at 12:56 AM, Jerome Yanga jya...@esri.com wrote:
 Here is the scenario.

 01)  There are two nodes in the Active-Passive cluster--Nomen and Rubric.
 02)  Nomen had a hardware and software failure.
 03)  Rubric took over the resources as expected.
 04)  Due to the failures, Nomen's operating system needed to be rebuilt.
 05)  DRBD was reinstalled on Nomen and made sure that its drbd.conf is 
 identical to Rubric's.
 06)  Nomen's drbd service has been started to sync its block device with 
 Rubric's.
 07)  Stopped the drbd service on Nomen.
 08)  Installed Pacemaker on Nomen and verified that its configuration is 
 identical to Rubric's.
 09)  Started heartbeat on Nomen but it will not rejoin the cluster.  It 
 status is stuck on UNCLEAN (offli
 ne).

 Am I missing some steps to make Nomen rejoin the cluster?

 The node uuid is probably different to the old value - which would be
 confusing the other node.
 Or you forgot the auth_keys file.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] A Node cannot rejoin the cluster after a rebuild

2009-05-08 Thread Jerome Yanga
Here is the scenario.

01)  There are two nodes in the Active-Passive cluster--Nomen and Rubric.  
02)  Nomen had a hardware and software failure.  
03)  Rubric took over the resources as expected.  
04)  Due to the failures, Nomen's operating system needed to be rebuilt.
05)  DRBD was reinstalled on Nomen and made sure that its drbd.conf is 
identical to Rubric's.
06)  Nomen's drbd service has been started to sync its block device with 
Rubric's.
07)  Stopped the drbd service on Nomen.
08)  Installed Pacemaker on Nomen and verified that its configuration is 
identical to Rubric's.
09)  Started heartbeat on Nomen but it will not rejoin the cluster.  It status 
is stuck on UNCLEAN (offli
ne).

Am I missing some steps to make Nomen rejoin the cluster?

Help.

jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together

2009-04-30 Thread Jerome Yanga
Here is my CIB.xml config.

cib.xml:

primitive fs0 ocf:heartbeat:Filesystem \
params fstype=ext3 directory=/data device=/dev/drbd0
primitive VIP ocf:heartbeat:IPaddr \
params ip=10.50.26.250 \
op monitor interval=5s timeout=5s
primitive Emergency_Contact ocf:heartbeat:MailTo \
params email=jya...@esri.com subject=Failover Occured \
op monitor interval=3s timeout=3s
primitive drbd0 ocf:heartbeat:drbd \
params drbd_resource=r0 \
op monitor interval=59s role=Master timeout=30s \
op monitor interval=60s role=Slave timeout=30s
group DRBD_Group fs0 VIP Emergency_Contact \
meta collocated=true ordered=true migration-threshold=1 failure-
timeout=10s resource-stickiness=10
ms ms-drbd0 drbd0 \
meta clone-max=2 notify=true globally-unique=false target-role=
Started
colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master
order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start

I did a bit more testing and here are the facts.

01)  Based on the cib.xml config below, when a node owns the resource group 
DRBD_Group, I would start the NFS service manually and I would get the error 
below.  Nevertheless, I would be able to access the NFS share from another 
machine.

 # service nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS daemon:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting RPC idmapd: Error: RPC MTAB does not exist. 

02)  When I failover the resource group DRBD_Group to the other node, I can 
start NFS with the same error but would still be able to access the share from 
another machine.

03)  However, if I add the NFS into the resource group DRBD_Group (see below), 
the share will still be accessible, but it will not failover due to the OCF 
agent nfsserver will not be able to shutdown NFS service.

To add the NFS resource into the DRBD_Group, I would add the following via 
crm(live) and also add nfs_share into the DRBD_group line.
primitive nfs_share ocf:heartbeat:nfsserver \
params nfs_init_script=/etc/init.d/nfs \
params nfs_notify_cmd=/sbin/rpc.statd \
params nfs_shared_infodir=/data/varlibnfs \
params nfs_ip=10.50.26.250 \
op monitor interval=30s
...
group DRBD_Group fs0 nfs_share VIP Emergency_Contact \
...

I think there is something wrong with the OCF agent nfsserver in that it cannot 
stop the NFS service.  As a result, the DRBD device will not failover.  Hence, 
the resource group will not failover.

Please help.

jerome


-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Jerome Yanga
Sent: Tuesday, April 28, 2009 5:28 PM
To: General Linux-HA mailing list
Subject: [Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together

Hi peeps!

I cannot get my High Availability NFS server work right.  Here is my 
configuration.

primitive share_name ocf:heartbeat:nfsserver \
params nfs_init_script=/etc/init.d/nfs \
params nfs_notify_cmd=/sbin/rpc.statd \
params nfs_shared_infodir=/data \
params nfs_ip=10.50.26.250 \
op monitor interval=30s

drbd-8.2.7-3
heartbeat-2.99.2-6.1
pacemaker-1.0.2-11.1
nfs-utils-1.0.9-40.el5
nfs-utils-lib-1.0.8-7.2.z2

Without heartbeat running, NFS works properly.  When I add NFS as a resource 
into a group, it gets added but it does not seem to work as I cannot get to the 
share from other systems.

I have tried following the site below, but I may have done something wrong 
since the share does not work.  :(

http://www.linux-ha.org/HaNFS

Help.

Thank you in advance.

jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together

2009-04-28 Thread Jerome Yanga
Hi peeps!

I cannot get my High Availability NFS server work right.  Here is my 
configuration.

primitive share_name ocf:heartbeat:nfsserver \
params nfs_init_script=/etc/init.d/nfs \
params nfs_notify_cmd=/sbin/rpc.statd \
params nfs_shared_infodir=/data \
params nfs_ip=10.50.26.250 \
op monitor interval=30s

drbd-8.2.7-3
heartbeat-2.99.2-6.1
pacemaker-1.0.2-11.1
nfs-utils-1.0.9-40.el5
nfs-utils-lib-1.0.8-7.2.z2

Without heartbeat running, NFS works properly.  When I add NFS as a resource 
into a group, it gets added but it does not seem to work as I cannot get to the 
share from other systems.

I have tried following the site below, but I may have done something wrong 
since the share does not work.  :(

http://www.linux-ha.org/HaNFS

Help.

Thank you in advance.

jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-03 Thread Jerome Yanga
Thanks.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Thursday, April 02, 2009 10:44 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD 
Daemon

Jerome Yanga wrote:
 Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD 
 daemon even if it is one of the resources.
 
 # service heartbeat stop
 Stopping High-Availability services:
[  OK  ]
 # service drbd status
 drbd driver loaded OK; device status:
 version: 8.2.7 (api:88/proto:86-88)
 GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by 
 r...@nomen.esri.com, 2009-03-24 08:29:57
 m:res  csst  ds  p  mounted  fstype
 0:r0   Unconfigured

It stops your drbd resource (device). It just does not unload the
module. That is the expected behaviour.

Regards
Dominik

 Running the command below stops the DRBD daemon.
 
 Service drbd stop
 
 
 Applications Installed:
 ===
 drbd-8.2.7-3
 heartbeat-2.99.2-6.1
 pacemaker-1.0.2-11.1
 
 
 CIB.xml:
 
 # crm configure show
 primitive fs0 ocf:heartbeat:Filesystem \
 params fstype=ext3 directory=/data device=/dev/drbd0
 primitive VIP ocf:heartbeat:IPaddr \
 params ip=10.50.26.250 \
 op monitor interval=5s timeout=5s
 primitive drbd0 ocf:heartbeat:drbd \
 params drbd_resource=r0 \
 op monitor interval=59s role=Master timeout=30s \
 op monitor interval=60s role=Slave timeout=30s
 group DRBD_Group fs0 VIP \
 meta collocated=true ordered=true migration-threshold=1 
 failure-timeout=10s resource-stickiness=10
 ms ms-drbd0 drbd0 \
 meta clone-max=2 notify=true globally-unique=false 
 target-role=Started
 colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master
 order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start
 
 Help.
 
 Regards,
 jerome
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Re: Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-03 Thread Jerome Yanga
Thanks.

In my situation, DRBD is a resource in my cluster.  Hence, it is managed by 
heartbeat.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Friday, April 03, 2009 1:50 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Re: Stopping the Heartbeat daemon does not stop the 
DRBD Daemon

Joe Bill wrote:
 
  Stopping the Heartbeat daemon (service heartbeat stop)
 does not stop the DRBD daemon even if it is one of
 the resources. 
 
 - Heartbeat and DRBD are 2 different products/packages
 
 - Like most services, DRBD doesn't need Heartbeat to run. You can set up and 
 run DRBD volumes without Heartbeat installed, or any cluster supervisor.
 
 - The DRBD daemons provide the communication interface for each network 
 volume and are therefor an integral part of the volume management. Without 
 the DRBD daemons, you (manually) and Heartbeat (automagically) could not 
 handle the DRBD volumes.

Just to avoid confusion: There is no such thing as a DRBD daemon. DRBD
is a kernel module.

 - If you look carefully at your startup, DRBD daemons start whether or not 
 Heartbeat is started.

That depends on your setup. Maybe in yours it does and it should. In
others it does not and it should not.

Regards
Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon

2009-04-02 Thread Jerome Yanga
Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD 
daemon even if it is one of the resources.

# service heartbeat stop
Stopping High-Availability services:
   [  OK  ]
# service drbd status
drbd driver loaded OK; device status:
version: 8.2.7 (api:88/proto:86-88)
GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by 
r...@nomen.esri.com, 2009-03-24 08:29:57
m:res  csst  ds  p  mounted  fstype
0:r0   Unconfigured

Running the command below stops the DRBD daemon.

Service drbd stop


Applications Installed:
===
drbd-8.2.7-3
heartbeat-2.99.2-6.1
pacemaker-1.0.2-11.1


CIB.xml:

# crm configure show
primitive fs0 ocf:heartbeat:Filesystem \
params fstype=ext3 directory=/data device=/dev/drbd0
primitive VIP ocf:heartbeat:IPaddr \
params ip=10.50.26.250 \
op monitor interval=5s timeout=5s
primitive drbd0 ocf:heartbeat:drbd \
params drbd_resource=r0 \
op monitor interval=59s role=Master timeout=30s \
op monitor interval=60s role=Slave timeout=30s
group DRBD_Group fs0 VIP \
meta collocated=true ordered=true migration-threshold=1 
failure-timeout=10s resource-stickiness=10
ms ms-drbd0 drbd0 \
meta clone-max=2 notify=true globally-unique=false 
target-role=Started
colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master
order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start

Help.

Regards,
jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] MailTo

2009-03-26 Thread Jerome Yanga
I think I had this issue in the past as well.

I just made sure that MAILCMD is assigned a mail app in .ocf-binaries.  Try the 
command below.

sed -i '/MAILCMD/s/=/=\/bin\/mail/g' 
/usr/lib/ocf/resource.d/heartbeat/.ocf-binaries

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of alexus
Sent: Thursday, March 26, 2009 2:24 PM
To: General Linux-HA mailing list
Subject: [Linux-HA] MailTo

when I go to http://www.linux-ha.org/MailTo
I get: Page not found.

I'm trying to figure out why my mailto isn't working... i tested email
leaving that server no problem..


primitive class=ocf type=MailTo provider=heartbeat id=resource_mailto
−
instance_attributes id=resource_mailto_instance_attrs
−
attributes
nvpair name=email id=a2fb7778-67f9-455e-bd93-72eadc937b7b
value=linux-ha-ale...@xxx.xxx/
nvpair id=5360309d-cf65-4482-bc5e-6f3edc6c68a3 name=subject
value=Linux-HA: MailTo/
/attributes
/instance_attributes
/primitive


-- 
http://alexus.org/
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Manual Resource Migration creates a Constraint

2009-03-13 Thread Jerome Yanga
When I manually migrate a group resource, a constraint is automatically 
created.  Is there a way to avoid this?

Here is the automatically created constraint.

rsc_location id=cli-standby-DRBD_Group rsc=DRBD_Group
rule id=cli-standby-rule-DRBD_Group score=-INFINITY 
boolean-op=and
  expression id=cli-standby-expr-DRBD_Group attribute=#uname 
operation=eq value=nomen.esri.com type=string/
/rule
  /rsc_location

Here is the command I use to manually migrate the group resource.

crm resource migrate DRBD_Group

This constraint prevents a node from taking over the group resource when 
Heartbeat is stopped on the other node.

Help.

Regards,
jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-11 Thread Jerome Yanga
Thank you all.  I have been happy with the functionality of the setup that you 
guys helped build.

For reference, here are the versions that I am running.

drbd-8.2.7-3
drbd-debuginfo-8.2.7-3
drbd-km-2.6.18_128.1.1.el5-8.2.7-3
heartbeat-2.99.2-6.1
heartbeat-common-2.99.2-6.1
heartbeat-debug-2.99.2-6.1
heartbeat-ldirectord-2.99.2-6.1
heartbeat-resources-2.99.2-6.1
kernel-2.6.18-128.1.1.el5
kernel-devel-2.6.18-128.1.1.el5
kernel-headers-2.6.18-128.1.1.el5
libheartbeat2-2.99.2-6.1
libpacemaker3-1.0.2-11.1
pacemaker-1.0.2-11.1
pacemaker-debug-1.0.2-11.1
pacemaker-mgmt-1.99.0-2.1
pacemaker-mgmt-client-1.99.0-2.1
pacemaker-mgmt-debug-1.99.0-2.1

I have tried DRBD 8.3.0 but the DRBD OCF Agent of Pacemaker 1.0.2-11.1 does not 
seem to work well with it.

Regards,
Jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Monday, March 09, 2009 12:05 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

Hi

Jerome Yanga wrote:
 Dominik,

 As usual, you are right on the money.  I should have caught that myself.  
 Thank you for catching that for me.  What happened was that I used a 
 different server to compile DRBD and I had assumed that Nomen and Rubic (my 
 test nodes) were on the same kernel.

 Moreover, I had also combined Neil's suggestion to yours as he had mentioned 
 that pacemaker-1.0.1 and drbd-8.2 works.

 My current issues are as follows:
 1)  I cannot migrate the resource fs0 from Nomen to Rubric.  Running the 
 command  crm resource migrate fs0 just puts fs0 to offline state.  This 
 sounds like a config change.  NOTE:  I am planning to add fs0 into a Group 
 that will be able to migrate between the two nodes (Nomen and Rubric).  Help. 
  Please provide the crm(live) syntax as I have tried the ones below and crm 
 complains that the syntax is wrong.

 order ms-drbd0-before-fs0 mandatory: ms-drbd0:promote fs0:start
 colocation fs0-on-ms-drbd0 inf: fs0 ms-drbd0:Master

You need 1.0.2 for that. 1.0.1 packages' crm shell had a bug there.

 2)  Is there a documentation for what resources, constraints and the like I 
 can add into the cib.xml via crm(live)?  Moreover, their syntax to add them 
 via crm(live)?

http://clusterlabs.org/wiki/Documentation

--snip--

 cib.xml:
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 epoch=153 num_updates=0 cib-last-written=Fri Mar  6 
 12:52:27 2009 dc-uuid=3a8b681c-a14b-4037-a8e6-2d4af2eff88e
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe/
 nvpair id=cib-bootstrap-options-last-lrm-refresh 
 name=last-lrm-refresh value=1236213117/
   /cluster_property_set
 /crm_config
 nodes
   node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com 
 type=normal/
   node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com 
 type=normal/
 /nodes
 resources
   master id=ms-drbd0
 meta_attributes id=ms-drbd0-meta_attributes
   nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max 
 value=2/
   nvpair id=ms-drbd0-meta_attributes-notify name=notify 
 value=true/
   nvpair id=ms-drbd0-meta_attributes-globally-unique 
 name=globally-unique value=false/
   nvpair id=ms-drbd0-meta_attributes-target-role 
 name=target-role value=Started/
 /meta_attributes
 primitive class=ocf id=drbd0 provider=heartbeat type=drbd
   instance_attributes id=drbd0-instance_attributes
 nvpair id=drbd0-instance_attributes-drbd_resource 
 name=drbd_resource value=r0/
   /instance_attributes
   operations id=drbd0-ops
 op id=drbd0-monitor-59s interval=59s name=monitor 
 role=Master timeout=30s/
 op id=drbd0-monitor-60s interval=60s name=monitor 
 role=Slave timeout=30s/
   /operations
 /primitive
   /master
   primitive class=ocf id=VIP provider=heartbeat type=IPaddr
 instance_attributes id=VIP-instance_attributes
   nvpair id=VIP-instance_attributes-ip name=ip 
 value=10.50.26.250/
 /instance_attributes
 operations id=VIP-ops
   op id=VIP-monitor-5s interval=5s name=monitor timeout=5s/
 /operations
   /primitive
   primitive class=ocf id=fs0 provider=heartbeat type=Filesystem
 instance_attributes id=fs0-instance_attributes
   nvpair id=fs0-instance_attributes-fstype name=fstype 
 value=ext3/
   nvpair id=fs0-instance_attributes-directory name=directory 
 value=/data/
   nvpair id=fs0-instance_attributes-device name=device 
 value=/dev/drbd0/
 /instance_attributes
   /primitive
 /resources
 constraints/

You don't have any constraints, so

RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-06 Thread Jerome Yanga
Thanks, Neil.  However, the reason why I wanted DRBD to start via Pacemaker is 
because I want Pacemaker to manage the DRBD process and be able to migrate it 
between the nodes.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian R. Hellman
Sent: Wednesday, March 04, 2009 4:52 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

DRBD needs to be running prior to starting pacemaker, it should be in
secondary/secondary mode.   When you stop the service you are unloading
the DRBD module, hence it can not start.

Jerome Yanga wrote:
 Hi Neil!

 Yes.  DRBD works outside of Pacemaker.  When I do a service drbd start on 
 each node, drbd runs properly and are both Secondary.

 jerome

 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin
 Sent: Wednesday, March 04, 2009 4:00 PM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker


 Does drbd work outside of pacemaker?  I suspect perhaps not from these lines 
 in your log:

 Mar  4 14:27:59 nomen modprobe: FATAL: Module drbd not found.
 Mar  4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) 
 Could not stat(/proc/drbd): No such file or directory do you need to load 
 the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 
 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' 
 terminated with exit code 20 drbdadm attach r0: exited with code 20
 Mar  4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode 
 after start.

 Try starting drbd by hand with pacemaker turned off; it should come up on 
 both nodes, with
 both nodes as secondary.  If it doesn't they you have to fix drbd first 
 before trying to
 add pacemaker to the mix.

  Neil

 Jerome Yanga wrote:

 Hi!  I am having issues with getting DRBD to work with Pacemaker.  I can get 
 Pacemaker and DRBD run individually but not DRBD managed by Pacemaker.  I 
 tried following the instruction in the site below but the resources will not 
 go online.

 http://clusterlabs.org/wiki/DRBD_HowTo_1.0

 Below is my configuration.

 Installed applications:
 ===
 kernel-2.6.18-128.el5
 drbd-8.3.0-3
 heartbeat-2.99.2-6.1
 pacemaker-1.0.1-3.1



 drbd.conf:
 ==
 global {
 usage-count no;
 }

 resource r0 {
   protocol C;
   handlers {
 pri-on-incon-degr echo o  /proc/sysrq-trigger ; halt -f;
 pri-lost-after-sb echo o  /proc/sysrq-trigger ; halt -f;
 local-io-error echo o  /proc/sysrq-trigger ; halt -f;
 outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5;
 pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
 Alert' root;
 out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root;
   }
   startup {
  wfc-timeout  0;
   }

   disk {
 on-io-error   pass_on;
   }
   net {
  max-buffers 2048;
 after-sb-0pri disconnect;
 after-sb-1pri disconnect;
 after-sb-2pri disconnect;
 rr-conflict disconnect;
   }
   syncer {
 rate 100M;
 al-extents 257;
   }
   on nomen.esri.com {
 device /dev/drbd0;
 disk   /dev/sda5;
 address192.168.0.1:7789;
 meta-disk  internal;
   }
   on rubric.esri.com {
 device/dev/drbd0;
 disk  /dev/sda5;
 address   192.168.0.2:7789;
 meta-disk internal;
   }
 }



 Cib.xml:
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 dc-uuid=a5
 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 
 cib-last-written=Wed Mar  4 14:27:59
  2009
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.1-node: 6fc5ce830
 2abf145a02891ec41e5a492efbe8efe/
   /cluster_property_set
 /crm_config
 nodes
   node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com 
 type=normal/
   node id=a5e95310-f27d-418e-9cb9-42e50310f702 
 uname=rubric.esri.com type=normal/
 /nodes
 resources
   master id=ms-drbd0
 meta_attributes id=ms-drbd0-meta_attributes
   nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max 
 value=2/
   nvpair id=ms-drbd0-meta_attributes-notify name=notify 
 value=true/
   nvpair id=ms-drbd0-meta_attributes-globally-unique 
 name=globally-unique value=false
 /
   nvpair name=target-role 
 id=ms-drbd0-meta_attributes-target-role value=Started/
 /meta_attributes
 primitive class=ocf id=drbd0 provider=heartbeat type=drbd
   instance_attributes id=drbd0-instance_attributes
 nvpair id=drbd0-instance_attributes-drbd_resource 
 name=drbd_resource value=r0/
   /instance_attributes
   operations id

RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-06 Thread Jerome Yanga
I apologize, Brian.  The gratitude should have been sent to you.  Thanks, 
Brian.  :)

Regards,
jerome

-Original Message-
From: Jerome Yanga
Sent: Friday, March 06, 2009 9:35 AM
To: General Linux-HA mailing list
Subject: RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

Thanks, Neil.  However, the reason why I wanted DRBD to start via Pacemaker is 
because I want Pacemaker to manage the DRBD process and be able to migrate it 
between the nodes.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian R. Hellman
Sent: Wednesday, March 04, 2009 4:52 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

DRBD needs to be running prior to starting pacemaker, it should be in
secondary/secondary mode.   When you stop the service you are unloading
the DRBD module, hence it can not start.

Jerome Yanga wrote:
 Hi Neil!

 Yes.  DRBD works outside of Pacemaker.  When I do a service drbd start on 
 each node, drbd runs properly and are both Secondary.

 jerome

 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin
 Sent: Wednesday, March 04, 2009 4:00 PM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker


 Does drbd work outside of pacemaker?  I suspect perhaps not from these lines 
 in your log:

 Mar  4 14:27:59 nomen modprobe: FATAL: Module drbd not found.
 Mar  4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) 
 Could not stat(/proc/drbd): No such file or directory do you need to load 
 the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 
 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' 
 terminated with exit code 20 drbdadm attach r0: exited with code 20
 Mar  4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode 
 after start.

 Try starting drbd by hand with pacemaker turned off; it should come up on 
 both nodes, with
 both nodes as secondary.  If it doesn't they you have to fix drbd first 
 before trying to
 add pacemaker to the mix.

  Neil

 Jerome Yanga wrote:

 Hi!  I am having issues with getting DRBD to work with Pacemaker.  I can get 
 Pacemaker and DRBD run individually but not DRBD managed by Pacemaker.  I 
 tried following the instruction in the site below but the resources will not 
 go online.

 http://clusterlabs.org/wiki/DRBD_HowTo_1.0

 Below is my configuration.

 Installed applications:
 ===
 kernel-2.6.18-128.el5
 drbd-8.3.0-3
 heartbeat-2.99.2-6.1
 pacemaker-1.0.1-3.1



 drbd.conf:
 ==
 global {
 usage-count no;
 }

 resource r0 {
   protocol C;
   handlers {
 pri-on-incon-degr echo o  /proc/sysrq-trigger ; halt -f;
 pri-lost-after-sb echo o  /proc/sysrq-trigger ; halt -f;
 local-io-error echo o  /proc/sysrq-trigger ; halt -f;
 outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5;
 pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
 Alert' root;
 out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root;
   }
   startup {
  wfc-timeout  0;
   }

   disk {
 on-io-error   pass_on;
   }
   net {
  max-buffers 2048;
 after-sb-0pri disconnect;
 after-sb-1pri disconnect;
 after-sb-2pri disconnect;
 rr-conflict disconnect;
   }
   syncer {
 rate 100M;
 al-extents 257;
   }
   on nomen.esri.com {
 device /dev/drbd0;
 disk   /dev/sda5;
 address192.168.0.1:7789;
 meta-disk  internal;
   }
   on rubric.esri.com {
 device/dev/drbd0;
 disk  /dev/sda5;
 address   192.168.0.2:7789;
 meta-disk internal;
   }
 }



 Cib.xml:
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 dc-uuid=a5
 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 
 cib-last-written=Wed Mar  4 14:27:59
  2009
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.1-node: 6fc5ce830
 2abf145a02891ec41e5a492efbe8efe/
   /cluster_property_set
 /crm_config
 nodes
   node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com 
 type=normal/
   node id=a5e95310-f27d-418e-9cb9-42e50310f702 
 uname=rubric.esri.com type=normal/
 /nodes
 resources
   master id=ms-drbd0
 meta_attributes id=ms-drbd0-meta_attributes
   nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max 
 value=2/
   nvpair id=ms-drbd0-meta_attributes-notify name=notify 
 value=true/
   nvpair id=ms-drbd0-meta_attributes-globally-unique 
 name=globally-unique value=false
 /
   nvpair name=target-role 
 id=ms-drbd0-meta_attributes-target-role value=Started

RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-06 Thread Jerome Yanga
Neil,

Unfortunately, I was not able to receive your scripts as it was filter by the 
mail servers.  However, I have tried your suggestion that Pacemaker 1.0.1 and 
DRBD 8.2 works.  With Pacemaker 1.0.1 and DRBD 8.2, I was able to add the DRBD 
resource into a master-slave resource in Pacemaker.  I just couldn't get the 
resource to migrate.  This is probably due to configuration settings that I am 
missing.  I will send more info on my next post.  By the way, can you try 
resending the scripts the a tar file?

Thank you in advance.

Regards,
Jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin
Sent: Wednesday, March 04, 2009 10:27 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker


OK.  I notice you're running drbd 8.3.  The scripts distributed with pacemaker 
1.0.1 only worked with 8.2 (I'm not sure about 1.0.2).  I made a few tiny 
changes to the drbd script so it would run with 8.3.  I've attached the changed 
script.  I also added more logging so you can see exactly what is going on in 
the drbd OCF script at debug level.  You probably want to change your loggging 
to debug to get all the output while trying to figure this out.

However, the errors I got from the script were very different from
yours: I was getting errors
from drbdadm, not failures to load the driver.

The other thing I did was run the OCF scripts by hand (you have to set a 
bunch of env variables).  I can't find the script I used to test drbd, but I've 
attached one I used for mysql; you should be able to adapt it to your use.  As 
always, remember bash -x is your friend.

Neil

Jerome Yanga wrote:
 Hi Neil!

 Yes.  DRBD works outside of Pacemaker.  When I do a service drbd start on 
 each node, drbd runs properly and are both Secondary.

 jerome
   
   
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-06 Thread Jerome Yanga
]: info: unpack_graph: Unpacked transition 
35: 1 actions in 1 synapses
Mar  6 12:56:14 nomen crmd: [14603]: info: do_te_invoke: Processing graph 35 
(ref=pe_calc-dc-1236372974-109) derived from 
/var/lib/heartbeat/pengine/pe-input-75.bz2
Mar  6 12:56:14 nomen crmd: [14603]: info: send_rsc_command: Initiating action 
41: start fs0_start_0 on nomen.esri.com
Mar  6 12:56:14 nomen crmd: [14603]: info: do_lrm_rsc_op: Performing 
key=41:35:0:44aada21-7997-4a4f-ba9a-4ae8a2629a58 op=fs0_start_0 )
Mar  6 12:56:14 nomen lrmd: [14509]: info: rsc:fs0: start
Mar  6 12:56:14 nomen Filesystem[1681]: INFO: Running start for /dev/drbd0 on 
/data
Mar  6 12:56:14 nomen cib: [1678]: info: write_cib_contents: Wrote version 
0.155.0 of the CIB to disk (digest: 0fd876c0a5f2db21a9aa66b3f997194f)
Mar  6 12:56:14 nomen cib: [1678]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
/var/lib/heartbeat/crm/cib.xml.sig)
Mar  6 12:56:14 nomen cib: [14508]: info: Managed write_cib_contents process 
1678 exited with return code 0.
Mar  6 12:56:14 nomen kernel: kjournald starting.  Commit interval 5 seconds
Mar  6 12:56:14 nomen kernel: EXT3 FS on drbd0, internal journal
Mar  6 12:56:14 nomen kernel: EXT3-fs: mounted filesystem with ordered data 
mode.
Mar  6 12:56:14 nomen lrmd: [14509]: info: Managed fs0:start process 1681 
exited with return code 0.
Mar  6 12:56:14 nomen lrmd: [14509]: info: Resource Agent output: []
Mar  6 12:56:14 nomen crmd: [14603]: info: process_lrm_event: LRM operation 
fs0_start_0 (call=47, rc=0, cib-update=115, confirmed=true) complete ok
Mar  6 12:56:15 nomen cib: [14508]: info: cib_process_request: Operation 
complete: op cib_modify for section 'all' (origin=local/crmd/115): ok (rc=0)
Mar  6 12:56:15 nomen crmd: [14603]: info: match_graph_event: Action 
fs0_start_0 (41) confirmed on nomen.esri.com (rc=0)
Mar  6 12:56:15 nomen crmd: [14603]: info: run_graph: 

Mar  6 12:56:15 nomen crmd: [14603]: notice: run_graph: Transition 35 
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/heartbeat/pengine/pe-input-75.bz2): Complete
Mar  6 12:56:15 nomen crmd: [14603]: info: te_graph_trigger: Transition 35 is 
now complete
Mar  6 12:56:15 nomen crmd: [14603]: info: notify_crmd: Transition 35 status: 
done - null
Mar  6 12:56:15 nomen crmd: [14603]: info: do_state_transition: State 
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Mar  6 12:56:15 nomen haclient: on_event: from message queue: evt:cib_changed
Mar  6 12:56:15 nomen mgmtd: [14526]: info: CIB query: cib
Mar  6 12:56:15 nomen heartbeat: [14466]: WARN: G_CH_dispatch_int: Dispatch 
function for read child took too long to execute: 70 ms ( 50 ms) (GSource: 
0x94add68)
Mar  6 12:56:17 nomen lrmd: [14509]: info: Resource Agent output: []

Regards,
jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Wednesday, March 04, 2009 10:54 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

Hi

Jerome Yanga wrote:
 Hi!  I am having issues with getting DRBD to work with Pacemaker.  I can get 
 Pacemaker and DRBD run individually but not DRBD managed by Pacemaker.  I 
 tried following the instruction in the site below but the resources will not 
 go online.

 http://clusterlabs.org/wiki/DRBD_HowTo_1.0

 Below is my configuration.

 Installed applications:
 ===
 kernel-2.6.18-128.el5

copy that

 drbd-8.3.0-3
 heartbeat-2.99.2-6.1
 pacemaker-1.0.1-3.1



 drbd.conf:
 ==
 global {
 usage-count no;
 }

 resource r0 {
   protocol C;
   handlers {
 pri-on-incon-degr echo o  /proc/sysrq-trigger ; halt -f;
 pri-lost-after-sb echo o  /proc/sysrq-trigger ; halt -f;
 local-io-error echo o  /proc/sysrq-trigger ; halt -f;
 outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5;
 pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
 Alert' root;
 out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root;
   }
   startup {
  wfc-timeout  0;
   }

   disk {
 on-io-error   pass_on;
   }
   net {
  max-buffers 2048;
 after-sb-0pri disconnect;
 after-sb-1pri disconnect;
 after-sb-2pri disconnect;
 rr-conflict disconnect;
   }
   syncer {
 rate 100M;
 al-extents 257;
   }
   on nomen.esri.com {
 device /dev/drbd0;
 disk   /dev/sda5;
 address192.168.0.1:7789;
 meta-disk  internal;
   }
   on rubric.esri.com {
 device/dev/drbd0;
 disk  /dev/sda5;
 address   192.168.0.2:7789;
 meta-disk internal;
   }
 }



 Cib.xml:
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 dc-uuid=a5
 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 
 cib-last-written=Wed

[Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-04 Thread Jerome Yanga
Hi!  I am having issues with getting DRBD to work with Pacemaker.  I can get 
Pacemaker and DRBD run individually but not DRBD managed by Pacemaker.  I tried 
following the instruction in the site below but the resources will not go 
online.

http://clusterlabs.org/wiki/DRBD_HowTo_1.0

Below is my configuration.

Installed applications:
===
kernel-2.6.18-128.el5
drbd-8.3.0-3
heartbeat-2.99.2-6.1
pacemaker-1.0.1-3.1



drbd.conf:
==
global {
usage-count no;
}

resource r0 {
  protocol C;
  handlers {
pri-on-incon-degr echo o  /proc/sysrq-trigger ; halt -f;
pri-lost-after-sb echo o  /proc/sysrq-trigger ; halt -f;
local-io-error echo o  /proc/sysrq-trigger ; halt -f;
outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5;
pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
Alert' root;
out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root;
  }
  startup {
 wfc-timeout  0;
  }

  disk {
on-io-error   pass_on;
  }
  net {
 max-buffers 2048;
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
  }
  syncer {
rate 100M;
al-extents 257;
  }
  on nomen.esri.com {
device /dev/drbd0;
disk   /dev/sda5;
address192.168.0.1:7789;
meta-disk  internal;
  }
  on rubric.esri.com {
device/dev/drbd0;
disk  /dev/sda5;
address   192.168.0.2:7789;
meta-disk internal;
  }
}



Cib.xml:

cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
have-quorum=1 dc-uuid=a5
e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 
cib-last-written=Wed Mar  4 14:27:59
 2009
  configuration
crm_config
  cluster_property_set id=cib-bootstrap-options
nvpair id=cib-bootstrap-options-dc-version name=dc-version 
value=1.0.1-node: 6fc5ce830
2abf145a02891ec41e5a492efbe8efe/
  /cluster_property_set
/crm_config
nodes
  node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com 
type=normal/
  node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com 
type=normal/
/nodes
resources
  master id=ms-drbd0
meta_attributes id=ms-drbd0-meta_attributes
  nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max 
value=2/
  nvpair id=ms-drbd0-meta_attributes-notify name=notify 
value=true/
  nvpair id=ms-drbd0-meta_attributes-globally-unique 
name=globally-unique value=false
/
  nvpair name=target-role id=ms-drbd0-meta_attributes-target-role 
value=Started/
/meta_attributes
primitive class=ocf id=drbd0 provider=heartbeat type=drbd
  instance_attributes id=drbd0-instance_attributes
nvpair id=drbd0-instance_attributes-drbd_resource 
name=drbd_resource value=r0/
  /instance_attributes
  operations id=drbd0-ops
op id=drbd0-monitor-59s interval=59s name=monitor 
role=Master timeout=30s/
op id=drbd0-monitor-60s interval=60s name=monitor 
role=Slave timeout=30s/
  /operations
/primitive
  /master
/resources
constraints/
  /configuration
/cib


/var/log/messages:
==
Mar  4 14:27:58 nomen crm_resource: [30167]: info: Invoked: crm_resource --meta 
-r ms-drbd0 -p target-role -v Started
Mar  4 14:27:58 nomen cib: [29899]: info: cib_process_xpath: Processing 
cib_query op for 
//cib/configuration/resources//*...@id=ms-drbd0]//meta_attributes//nvpa...@name=target-role]
 (/cib/configuration/resources/master/meta_attributes/nvpair[4])
Mar  4 14:27:59 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing 
key=5:5:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_start_0 )
Mar  4 14:27:59 nomen haclient: on_event:evt:cib_changed
Mar  4 14:27:59 nomen lrmd: [29900]: info: rsc:drbd0:0: start
Mar  4 14:27:59 nomen cib: [30168]: info: write_cib_contents: Wrote version 
0.56.0 of the CIB to disk (digest: 2365d9802f1b9c55e0ed87b8ebda5db3)
Mar  4 14:27:59 nomen cib: [30168]: info: retrieveCib: Reading cluster 
configuration from: /var/lib/heartbeat/crm/cib.xml (digest: 
/var/lib/heartbeat/crm/cib.xml.sig)
Mar  4 14:27:59 nomen cib: [29899]: info: Managed write_cib_contents process 
30168 exited with return code 0.
Mar  4 14:27:59 nomen modprobe: FATAL: Module drbd not found.
Mar  4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout)
Mar  4 14:27:59 nomen mgmtd: [29904]: info: CIB query: cib
Mar  4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) 
Could not stat(/proc/drbd): No such file or directory do you need to load the 
module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 
/dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' 
terminated with exit code 20 drbdadm attach r0: exited with code 20
Mar  4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after 
start.
Mar  4 14:27:59 nomen lrmd: [29900]: WARN: Managed drbd0:0:start process 

RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker

2009-03-04 Thread Jerome Yanga
Hi Neil!

Yes.  DRBD works outside of Pacemaker.  When I do a service drbd start on 
each node, drbd runs properly and are both Secondary.

jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin
Sent: Wednesday, March 04, 2009 4:00 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker


Does drbd work outside of pacemaker?  I suspect perhaps not from these lines in 
your log:

Mar  4 14:27:59 nomen modprobe: FATAL: Module drbd not found.
Mar  4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) 
Could not stat(/proc/drbd): No such file or directory do you need to load the 
module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 
/dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' 
terminated with exit code 20 drbdadm attach r0: exited with code 20
Mar  4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after 
start.

Try starting drbd by hand with pacemaker turned off; it should come up on 
both nodes, with
both nodes as secondary.  If it doesn't they you have to fix drbd first 
before trying to
add pacemaker to the mix.

 Neil

Jerome Yanga wrote:
 Hi!  I am having issues with getting DRBD to work with Pacemaker.  I can get 
 Pacemaker and DRBD run individually but not DRBD managed by Pacemaker.  I 
 tried following the instruction in the site below but the resources will not 
 go online.
 
 http://clusterlabs.org/wiki/DRBD_HowTo_1.0
 
 Below is my configuration.
 
 Installed applications:
 ===
 kernel-2.6.18-128.el5
 drbd-8.3.0-3
 heartbeat-2.99.2-6.1
 pacemaker-1.0.1-3.1
 
 
 
 drbd.conf:
 ==
 global {
 usage-count no;
 }
 
 resource r0 {
   protocol C;
   handlers {
 pri-on-incon-degr echo o  /proc/sysrq-trigger ; halt -f;
 pri-lost-after-sb echo o  /proc/sysrq-trigger ; halt -f;
 local-io-error echo o  /proc/sysrq-trigger ; halt -f;
 outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5;
 pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD 
 Alert' root;
 out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root;
   }
   startup {
  wfc-timeout  0;
   }
 
   disk {
 on-io-error   pass_on;
   }
   net {
  max-buffers 2048;
 after-sb-0pri disconnect;
 after-sb-1pri disconnect;
 after-sb-2pri disconnect;
 rr-conflict disconnect;
   }
   syncer {
 rate 100M;
 al-extents 257;
   }
   on nomen.esri.com {
 device /dev/drbd0;
 disk   /dev/sda5;
 address192.168.0.1:7789;
 meta-disk  internal;
   }
   on rubric.esri.com {
 device/dev/drbd0;
 disk  /dev/sda5;
 address   192.168.0.2:7789;
 meta-disk internal;
   }
 }
 
 
 
 Cib.xml:
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 dc-uuid=a5
 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 
 cib-last-written=Wed Mar  4 14:27:59
  2009
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.1-node: 6fc5ce830
 2abf145a02891ec41e5a492efbe8efe/
   /cluster_property_set
 /crm_config
 nodes
   node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com 
 type=normal/
   node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com 
 type=normal/
 /nodes
 resources
   master id=ms-drbd0
 meta_attributes id=ms-drbd0-meta_attributes
   nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max 
 value=2/
   nvpair id=ms-drbd0-meta_attributes-notify name=notify 
 value=true/
   nvpair id=ms-drbd0-meta_attributes-globally-unique 
 name=globally-unique value=false
 /
   nvpair name=target-role 
 id=ms-drbd0-meta_attributes-target-role value=Started/
 /meta_attributes
 primitive class=ocf id=drbd0 provider=heartbeat type=drbd
   instance_attributes id=drbd0-instance_attributes
 nvpair id=drbd0-instance_attributes-drbd_resource 
 name=drbd_resource value=r0/
   /instance_attributes
   operations id=drbd0-ops
 op id=drbd0-monitor-59s interval=59s name=monitor 
 role=Master timeout=30s/
 op id=drbd0-monitor-60s interval=60s name=monitor 
 role=Slave timeout=30s/
   /operations
 /primitive
   /master
 /resources
 constraints/
   /configuration
 /cib
 
 
 /var/log/messages:
 ==
 Mar  4 14:27:58 nomen crm_resource: [30167]: info: Invoked: crm_resource 
 --meta -r ms-drbd0 -p target-role -v Started
 Mar  4 14:27:58 nomen cib: [29899]: info: cib_process_xpath: Processing 
 cib_query op for 
 //cib/configuration/resources//*...@id=ms-drbd0]//meta_attributes//nvpa...@name=target-role]
  (/cib/configuration/resources/master

RE: [Linux-HA] Call cib_create failed (-47): Update does not conform to the configured schema/DTD

2009-02-25 Thread Jerome Yanga
I found the site as soon as I sent it out.

http://clusterlabs.org/wiki/DRBD_HowTo_1.0

'Hope this helps others.  :)

Regards,
Jerome

From: Jerome Yanga
Sent: Wednesday, February 25, 2009 2:40 PM
To: General Linux-HA mailing list
Subject: [Linux-HA] Call cib_create failed (-47): Update does not conform to 
the configured schema/DTD

I have been trying to add DRBD on my Pacemaker config using crm(live) but I 
keep getting this message below:

Call cib_create failed (-47): Update does not conform to the configured 
schema/DTD

Can someone show me the syntax for crm(live) to add my DRBD resource below?

   master_slave id=r0_data
 meta_attributes id=r0_data_meta_attrs
   attributes
 nvpair id=r0_data_metaattr_target_role name=target_role 
value=started/
 nvpair id=r0_data_metaattr_clone_max name=clone_max 
value=2/
 nvpair id=r0_data_metaattr_clone_node_max name=clone_node_max 
value=1/
 nvpair id=r0_data_metaattr_master_max name=master_max 
value=1/
 nvpair id=r0_data_metaattr_master_node_max 
name=master_node_max value=1/
 nvpair id=r0_data_metaattr_notify name=notify value=true/
 nvpair id=r0_data_metaattr_globally_unique 
name=globally_unique value=false/
   /attributes
 /meta_attributes
 primitive id=drbd_resource class=ocf type=drbd 
provider=heartbeat
   instance_attributes id=drbd_resource_instance_attrs
 attributes
   nvpair id=cd3b3992-4492-478d-ad27-eaaf0698ec53 
name=drbd_resource value=drbd0/

 /attributes
   /instance_attributes
   meta_attributes id=drbd_resource:0_meta_attrs
 attributes
   nvpair id=drbd_resource:1_metaattr_target_role 
name=target_role value=stopped/

 /attributes
   /meta_attributes
   operations
 op id=c2f4dd35-db6c-4d20-ab90-239aa511726f name=monitor 
interval=4s timeout=5s
 disabled=false role=Master start_delay=0/
 op id=046449e7-247b-42a3-a0bf-d7b9bc0fe010 name=monitor 
interval=5s timeout=5s
 disabled=false role=Slave start_delay=0/
   /operations
 /primitive
   /master_slave

Help.

Regards,
Jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Call cib_create failed (-47): Update does not conform to the configured schema/DTD

2009-02-25 Thread Jerome Yanga
I have been trying to add DRBD on my Pacemaker config using crm(live) but I 
keep getting this message below:

Call cib_create failed (-47): Update does not conform to the configured 
schema/DTD

Can someone show me the syntax for crm(live) to add my DRBD resource below?

   master_slave id=r0_data
 meta_attributes id=r0_data_meta_attrs
   attributes
 nvpair id=r0_data_metaattr_target_role name=target_role 
value=started/
 nvpair id=r0_data_metaattr_clone_max name=clone_max 
value=2/
 nvpair id=r0_data_metaattr_clone_node_max name=clone_node_max 
value=1/
 nvpair id=r0_data_metaattr_master_max name=master_max 
value=1/
 nvpair id=r0_data_metaattr_master_node_max 
name=master_node_max value=1/
 nvpair id=r0_data_metaattr_notify name=notify value=true/
 nvpair id=r0_data_metaattr_globally_unique 
name=globally_unique value=false/
   /attributes
 /meta_attributes
 primitive id=drbd_resource class=ocf type=drbd 
provider=heartbeat
   instance_attributes id=drbd_resource_instance_attrs
 attributes
   nvpair id=cd3b3992-4492-478d-ad27-eaaf0698ec53 
name=drbd_resource value=drbd0/

 /attributes
   /instance_attributes
   meta_attributes id=drbd_resource:0_meta_attrs
 attributes
   nvpair id=drbd_resource:1_metaattr_target_role 
name=target_role value=stopped/

 /attributes
   /meta_attributes
   operations
 op id=c2f4dd35-db6c-4d20-ab90-239aa511726f name=monitor 
interval=4s timeout=5s
 disabled=false role=Master start_delay=0/
 op id=046449e7-247b-42a3-a0bf-d7b9bc0fe010 name=monitor 
interval=5s timeout=5s
 disabled=false role=Slave start_delay=0/
   /operations
 /primitive
   /master_slave

Help.

Regards,
Jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Failover not working as I expected

2009-01-29 Thread Jerome Yanga
I have checked both services on each server and both of them are off.

[r...@rubric ~]# chkconfig --list dirsrv
dirsrv  0:off   1:off   2:off   3:off   4:off   5:off   6:off
[r...@rubric ~]# chkconfig --list dirsrv-admin
dirsrv-admin0:off   1:off   2:off   3:off   4:off   5:off   6:off

[r...@nomen ~]#  chkconfig --list dirsrv
dirsrv  0:off   1:off   2:off   3:off   4:off   5:off   6:off
[r...@nomen ~]# chkconfig --list dirsrv-admin
dirsrv-admin0:off   1:off   2:off   3:off   4:off   5:off   6:off

The way I determine if the service bounces when a node rejoins the cluster is 
by running a script called statusfds.sh.  This script contains the following:

#!/bin/bash
service dirsrv status
service dirsrv-admin status

Then I run the following command to monitor the services.

watch -n 1 statusfds.sh

Moreover, even hb_gui shows that the services are bounced/restarted when a node 
joins the cluster.  The status of the resources changes to failed for a 
second and changes back to running on.

Regards,
jerome

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof
Sent: Thursday, January 29, 2009 3:51 AM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected

On Tue, Jan 27, 2009 at 22:04, Jerome Yanga jya...@esri.com wrote:
 Dominik,

 Here is the status of the two concerns I needed help on.

 01)  When a node comes back up after a restart of heartbeat, resources gets 
 bounced when it rejoins the cluster.
 STATUS:  The resources still gets bounced when a node joins the cluster even 
 if I had deleted all the constraints.

You might want to check if the service is being started automatically
by the OS when it boots.
The cluster will notice this and the recovery can make it look the
resource is merely bouncing.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


RE: [Linux-HA] Failover not working as I expected

2009-01-29 Thread Jerome Yanga
Good evening to you, Dominik.  :)

I apologize for being persistent.  I can work around the situations that I have 
encountered via creating scripts.  However, I just thought that there may be 
something in the configuration that I can tweak to make it work.  You have been 
very helpful and that is greatly appreciated.  In fact, you have resolved all 
the situations I encountered, except the one that you had asked me to create a 
bug report on which I would so that product will be better.  Besides, you will 
probably hate this project that I am working on to fall into MSCS (Microsoft 
Cluster Service) as much as I will.  Oooh...just the thought that the project 
will resort to a Microsoft solution makes me feel like I am losing my freedom 
(I certainly do not want this to happen and will try hard for this not to 
happen).

I have submitted this to Bugzilla as you have recommended.  It is registered as 
Bug 2047.

Thank you for your support.

Regards,
jerome





-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Wednesday, January 28, 2009 11:19 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected

Good morning Jerome

we should make this a daily thing, shouldn't we?

Jerome Yanga wrote:
 Dominik,
 
 I apologize for leaving resource-stickiness out.  I had it there previously 
 but due to the trial and errors I had performed on the crm shell, I had 
 forgotten to re-add it.  Nevertheless, adding it to my cib.xml file does not 
 seem to work.
 
 Here is the chain of events.  This happens on either Nomen or Rubric.
 
 01)  Nomen (one of the two nodes) owns the group resource, called 
 Directory_Server.  In the meantime, Rubric (the other node) is just there 
 waiting for the resources to come to him.  :)
 02)  I stop heartbeat on Nomen and the Directory_Server resource group fails 
 over to Rubric.
 03)  Nomen's status changes from running(dc) to stopped
 04)  After waiting for step #3 to finish its transition, I start heartbeat 
 back up in Nomen.
 05)  Nomen's status changes from stopped to running-standby to running.
 06)  Rubric retains all the resources.  However, all the resources on Rubric 
 bounces/restarts when Nomen's status changes from running-standby to 
 running.

With the configuration you posted below, this should not happen. The
configuration looks good for what you want. If you're sure that is what
you do and get, please file a bug about that and include a hb_report.

http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

 Is there a way to prevent the resources in Rubric to bounce/restart when 
 Nomen rejoins the cluster?
 
 Help.
 
 
 
 On the other hand, you pointed me to the right direction regarding the MailTo 
 OCFAgent.
 
 This is how the variable looked like in .ocf-binaries when it was not working.
 
 rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
 : ${MAILCMD:=}
 
 I assigned the exact path of the mail command to the variable.  Now, I get 
 emailed every time a failover happens.  Wooot!  Wooot!  :)
 
 rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries
 : ${MAILCMD:=/bin/mail}

Good. I think this was on the lists earlier. Apparently a packaging issue.

Regards
Dominik

 Thanks.
 
 
 Below is my current cib.xml file.
 
 cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 
 have-quorum=1 dc-uuid=27f54ec3-b626-4b4f-b8a6-4ed0b768513c epoch=102 
 num_updates=0 cib-last-written=Wed Jan 28 08:32:39 2009
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe/
   /cluster_property_set
 /crm_config
 nodes
   node id=5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e uname=nomen.esri.com 
 type=normal
 instance_attributes id=nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e
   nvpair id=standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e 
 name=standby value=off/
 /instance_attributes
   /node
   node id=27f54ec3-b626-4b4f-b8a6-4ed0b768513c uname=rubric.esri.com 
 type=normal
 instance_attributes id=nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c
   nvpair id=standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c 
 name=standby value=off/
 /instance_attributes
   /node
 /nodes
 resources
   group id=Directory_Server
 meta_attributes id=Directory_Server-meta_attributes
   nvpair id=Directory_Server-meta_attributes-collocated 
 name=collocated value=true/
   nvpair id=Directory_Server-meta_attributes-ordered 
 name=ordered value=true/
   nvpair id=Directory_Server-meta_attributes-migration-threshold 
 name=migration-threshold value=1/
   nvpair id=Directory_Server-meta_attributes-failure-timeout 
 name=failure-timeout value=10s

RE: [Linux-HA] Failover not working as I expected

2009-01-16 Thread Jerome Yanga
Dominik,

Thank you much.   Adding resource-stickiness and getting rid of the 
constraint helped a lot.  The resources does not go back to Nomen anymore when 
it's heartbeat is started again  (resources stays with Rubric).  However, the 
resources still gets bounced once Nomen joins the cluster.  Is there any way to 
keep the resources from bouncing when Nomen rejoins the cluster?

I have also observed another issue.  As you have seen in my cib.xml, I have 
created a group called Directory_Server.  In this group, there are three 
resources, namely:  VIP, ECAS and FDS_Admin.  If I manually turn off any of 
these resources, I would like the group resource, Directory_Server, to failover 
to the other node.  Is there a configuration that will do this?  Currently, if 
one of three resources goes down it stays down and the rest continues running.  
All three resources will need to be up and running for our applications to work 
properly.

To answer your question...

Also due to your rsc_location. The resource is where you configured it
(on nomen), so why move it around?

I added rsc_location in the configuration as I was trying to follow the sample 
ActivePassive configuration.

http://linux-ha.org/GettingStartedV2/OneIPAddress

I have been moving resources around because I am testing HA thoroughly before I 
implement it in our production environment.

Regards,
Jerome



-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein
Sent: Thursday, January 15, 2009 11:16 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] Failover not working as I expected

Hi Jerome

 The name of the servers are as follows:  Nomen and Rubric.  
 
 Let us start when Nomen owns all resources and its status states 
 running(dc).  When I stop heartbeat on Nomen, Rubric takes over all the 
 resources and its status turns into running(dc).  This is good as this is 
 what I had hoped that it will do.
 
 When I start heartbeat back on Nomen, it takes all the resource away from 
 Rubric.  However, it leaves Rubric in running(dc) status and Nomen's status 
 just states running.  There are two issues here that I see.
 
 1)  I do not want Nomen to take the resources as this means that the 
 resources will be bounced.

This happens because of your rsc_location constraint. You normally want
your resource to be on nomen, so if the cluster can, it will run it there.

rsc_location id=fdstest rsc=Directory_Server
rule id=prefered_fdstest score=100 boolean_op=or
expression attribute=#uname id=9e5698e0-8b07-43aa-b852-398fbe6bb909
operation=eq value=nomen.esri.com/
/rule
/rsc_location

If you want the resource to stick to its current location even when the
preferred node comes back, look into the meta-attribute
resource-stickiness. Read http://www.linux-ha.org/ScoreCalculation

 2)  I would like to have the Quorum or running(dc) where the resources are.

You can't move the dc role manually. And you do not have to bother which
machine is the dc. It is totally fine having resources on a node which
is not the dc.

The current dc stays dc until it is shutdown or separated from the
cluster in some manner.

 To continue, when I stop heartbeat on Rubric, the running(dc) status goes 
 over to Nomen. I then start heartbeat in Rubric and all resources as well as 
 the running(dc) stays with Nomen.  Moreover, the resources are not bounced 
 at all.

Also due to your rsc_location. The resource is where you configured it
(on nomen), so why move it around?

Regards
Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] How to Monitor Services

2008-12-31 Thread Jerome Yanga
I have tried reading and interpreting the link below but could not get the 
Resource Monitoring to work.

http://linux-ha.org/ClusterInformationBase/Actions

I have installed heartbeat-2.1.3-1.  The following is in my 
/var/lib/heartbeat/crm/cib.xml file.

primitive id=ecas class=lsb type=dirsrv provider=heartbeat
meta_attributes id=ecas_meta_attrs
attributes
nvpair id=ecas_metaattr_is_managed name=is_managed value=true/
/attributes
/meta_attributes
operations
op id=3cb67e8a-0c36-43be-a758-be32ff1a377d name=stop timeout=3s 
start_delay=0s
 disabled=false role=Started/
op id=0aa741d5-3540-4f0a-a998-b842e346e574 name=start timeout=5s 
start_delay=0s
 disabled=false role=Started/
op id=df305ad8-92c7-4c95-bb8f-5646f9049a6f name=monitor interval=5s 
timeout=3s
 start_delay=0s disabled=false role=Master on_fail=restart 
prereq=nothing/
op id=e4e428cf-b1c7-40af-aa3f-c8f25cded958 name=monitor interval=10s 
timeout=3s
 start_delay=0s disabled=false role=Slave on_fail=restart 
prereq=nothing/
op id=977d0884-b419-4494-ab4c-d1c130e8dee4 name=monitor interval=6s 
timeout=3s
 role=Started on_fail=restart start_delay=0s disabled=false 
prereq=nothing/
op id=9a46191d-cf6e-4243-bd25-6f9ea44116ca name=monitor interval=7s 
timeout=3s
 role=Stopped on_fail=restart start_delay=0s disabled=false 
prereq=nothing/
/operations
/primitive

I only have a single node as I just wanted to test if Heartbeat will start the 
service automatically if I shutdown the dirsrv service manually.

Here is my /etc/ha.d/ha.cf.

# cat /etc/ha.d/ha.cf
use_logd on
bcast eth0
node server1
crm yes

Help.

jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] How to Monitor Services

2008-12-31 Thread Jerome Yanga
I have tried reading and interpreting the link below but could not get the 
Resource Monitoring to work.

http://linux-ha.org/ClusterInformationBase/Actions

I have installed heartbeat-2.1.3-1.  The following is in my 
/var/lib/heartbeat/crm/cib.xml file.

primitive id=ecas class=lsb type=dirsrv provider=heartbeat
meta_attributes id=ecas_meta_attrs
attributes
nvpair id=ecas_metaattr_is_managed name=is_managed value=true/
/attributes
/meta_attributes
operations
op id=3cb67e8a-0c36-43be-a758-be32ff1a377d name=stop timeout=3s 
start_delay=0s
 disabled=false role=Started/
op id=0aa741d5-3540-4f0a-a998-b842e346e574 name=start timeout=5s 
start_delay=0s
 disabled=false role=Started/
op id=df305ad8-92c7-4c95-bb8f-5646f9049a6f name=monitor interval=5s 
timeout=3s
 start_delay=0s disabled=false role=Master on_fail=restart 
prereq=nothing/
op id=e4e428cf-b1c7-40af-aa3f-c8f25cded958 name=monitor interval=10s 
timeout=3s
 start_delay=0s disabled=false role=Slave on_fail=restart 
prereq=nothing/
op id=977d0884-b419-4494-ab4c-d1c130e8dee4 name=monitor interval=6s 
timeout=3s
 role=Started on_fail=restart start_delay=0s disabled=false 
prereq=nothing/
op id=9a46191d-cf6e-4243-bd25-6f9ea44116ca name=monitor interval=7s 
timeout=3s
 role=Stopped on_fail=restart start_delay=0s disabled=false 
prereq=nothing/
/operations
/primitive

I only have a single node as I just wanted to test if Heartbeat will start the 
service automatically if I shutdown the dirsrv service manually.

Here is my /etc/ha.d/ha.cf.

# cat /etc/ha.d/ha.cf
use_logd on
bcast eth0
node server1
crm yes

Help.

jerome
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems