Re: [Linux-HA] How do I clear the Failed actions section?
I would just want to share that the command recommended did NOT move the resource to another node. It basically clears the Failed Actions section. Thanks again, Bill. Regards, j On Tue, Mar 6, 2012 at 11:46 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 2:38 PM, Jerome Yanga wrote: Do you know by chance if that command you have provided bounces the resource? I don't know what you mean by bounce the resource. According to: http://www.clusterlabs.org/doc/crm_cli.html the command refreshes the resource status. Depending on your configuration, it might shift a resource to another node. But I am not an expert! I merely knew how to clear up the error message. On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. �However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] How do I clear the Failed actions section?
crm_mon shows the error below. Failed actions: drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. However, the failed actions section has not disappeared. How do I clear this error? Regards, j ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
Thanks, Bill. Do you know by chance if that command you have provided bounces the resource? Regards, j On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] How do I clear the Failed actions section?
Understood. Thanks again, Bill. Regards, j On Tue, Mar 6, 2012 at 11:46 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 2:38 PM, Jerome Yanga wrote: Do you know by chance if that command you have provided bounces the resource? I don't know what you mean by bounce the resource. According to: http://www.clusterlabs.org/doc/crm_cli.html the command refreshes the resource status. Depending on your configuration, it might shift a resource to another node. But I am not an expert! I merely knew how to clear up the error message. On Tue, Mar 6, 2012 at 10:28 AM, William Seligman selig...@nevis.columbia.edu wrote: On 3/6/12 1:04 PM, Jerome Yanga wrote: crm_mon shows the error below. Failed actions: � � drbd0:1_monitor_59000 (node=testserver1.example.com, call=132, rc=-2, status=Timed Out): unknown exec error I have check DRBD and the mirror is connected and uptodate on both nodes. The error above caused the resources to failover and it seems to be working OK. �However, the failed actions section has not disappeared. How do I clear this error? crm resource cleanup drbd0 -- Bill Seligman | Phone: (914) 591-2823 Nevis Labs, Columbia Univ | mailto://selig...@nevis.columbia.edu PO Box 137 | Irvington NY 10533 USA | http://www.nevis.columbia.edu/~seligman/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild
Thanks, Andrew. FYI, it seems that the crm(live) takes care of this. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof Sent: Tuesday, May 12, 2009 6:43 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild try using crm_uuid to write back the value Rubric thinks Normen should have, i forget the name of the file on rubric that contains it. hostcache or something like that On Mon, May 11, 2009 at 6:18 PM, Jerome Yanga jya...@esri.com wrote: Andrew, Will I be able change the UUID saved in Rubric so that it would reflect the new one from the rebuilt Nomen? If so, which file(s) do I need to modify? Thank you. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof Sent: Monday, May 11, 2009 12:45 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] A Node cannot rejoin the cluster after a rebuild On Sat, May 9, 2009 at 12:56 AM, Jerome Yanga jya...@esri.com wrote: Here is the scenario. 01) There are two nodes in the Active-Passive cluster--Nomen and Rubric. 02) Nomen had a hardware and software failure. 03) Rubric took over the resources as expected. 04) Due to the failures, Nomen's operating system needed to be rebuilt. 05) DRBD was reinstalled on Nomen and made sure that its drbd.conf is identical to Rubric's. 06) Nomen's drbd service has been started to sync its block device with Rubric's. 07) Stopped the drbd service on Nomen. 08) Installed Pacemaker on Nomen and verified that its configuration is identical to Rubric's. 09) Started heartbeat on Nomen but it will not rejoin the cluster. It status is stuck on UNCLEAN (offli ne). Am I missing some steps to make Nomen rejoin the cluster? The node uuid is probably different to the old value - which would be confusing the other node. Or you forgot the auth_keys file. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] A Node cannot rejoin the cluster after a rebuild
Here is the scenario. 01) There are two nodes in the Active-Passive cluster--Nomen and Rubric. 02) Nomen had a hardware and software failure. 03) Rubric took over the resources as expected. 04) Due to the failures, Nomen's operating system needed to be rebuilt. 05) DRBD was reinstalled on Nomen and made sure that its drbd.conf is identical to Rubric's. 06) Nomen's drbd service has been started to sync its block device with Rubric's. 07) Stopped the drbd service on Nomen. 08) Installed Pacemaker on Nomen and verified that its configuration is identical to Rubric's. 09) Started heartbeat on Nomen but it will not rejoin the cluster. It status is stuck on UNCLEAN (offli ne). Am I missing some steps to make Nomen rejoin the cluster? Help. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together
Here is my CIB.xml config. cib.xml: primitive fs0 ocf:heartbeat:Filesystem \ params fstype=ext3 directory=/data device=/dev/drbd0 primitive VIP ocf:heartbeat:IPaddr \ params ip=10.50.26.250 \ op monitor interval=5s timeout=5s primitive Emergency_Contact ocf:heartbeat:MailTo \ params email=jya...@esri.com subject=Failover Occured \ op monitor interval=3s timeout=3s primitive drbd0 ocf:heartbeat:drbd \ params drbd_resource=r0 \ op monitor interval=59s role=Master timeout=30s \ op monitor interval=60s role=Slave timeout=30s group DRBD_Group fs0 VIP Emergency_Contact \ meta collocated=true ordered=true migration-threshold=1 failure- timeout=10s resource-stickiness=10 ms ms-drbd0 drbd0 \ meta clone-max=2 notify=true globally-unique=false target-role= Started colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start I did a bit more testing and here are the facts. 01) Based on the cib.xml config below, when a node owns the resource group DRBD_Group, I would start the NFS service manually and I would get the error below. Nevertheless, I would be able to access the NFS share from another machine. # service nfs start Starting NFS services: [ OK ] Starting NFS quotas: [ OK ] Starting NFS daemon: [ OK ] Starting NFS mountd: [ OK ] Starting RPC idmapd: Error: RPC MTAB does not exist. 02) When I failover the resource group DRBD_Group to the other node, I can start NFS with the same error but would still be able to access the share from another machine. 03) However, if I add the NFS into the resource group DRBD_Group (see below), the share will still be accessible, but it will not failover due to the OCF agent nfsserver will not be able to shutdown NFS service. To add the NFS resource into the DRBD_Group, I would add the following via crm(live) and also add nfs_share into the DRBD_group line. primitive nfs_share ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs \ params nfs_notify_cmd=/sbin/rpc.statd \ params nfs_shared_infodir=/data/varlibnfs \ params nfs_ip=10.50.26.250 \ op monitor interval=30s ... group DRBD_Group fs0 nfs_share VIP Emergency_Contact \ ... I think there is something wrong with the OCF agent nfsserver in that it cannot stop the NFS service. As a result, the DRBD device will not failover. Hence, the resource group will not failover. Please help. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Jerome Yanga Sent: Tuesday, April 28, 2009 5:28 PM To: General Linux-HA mailing list Subject: [Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together Hi peeps! I cannot get my High Availability NFS server work right. Here is my configuration. primitive share_name ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs \ params nfs_notify_cmd=/sbin/rpc.statd \ params nfs_shared_infodir=/data \ params nfs_ip=10.50.26.250 \ op monitor interval=30s drbd-8.2.7-3 heartbeat-2.99.2-6.1 pacemaker-1.0.2-11.1 nfs-utils-1.0.9-40.el5 nfs-utils-lib-1.0.8-7.2.z2 Without heartbeat running, NFS works properly. When I add NFS as a resource into a group, it gets added but it does not seem to work as I cannot get to the share from other systems. I have tried following the site below, but I may have done something wrong since the share does not work. :( http://www.linux-ha.org/HaNFS Help. Thank you in advance. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Cannot get Heartbeat, DRBD and NFS to work together
Hi peeps! I cannot get my High Availability NFS server work right. Here is my configuration. primitive share_name ocf:heartbeat:nfsserver \ params nfs_init_script=/etc/init.d/nfs \ params nfs_notify_cmd=/sbin/rpc.statd \ params nfs_shared_infodir=/data \ params nfs_ip=10.50.26.250 \ op monitor interval=30s drbd-8.2.7-3 heartbeat-2.99.2-6.1 pacemaker-1.0.2-11.1 nfs-utils-1.0.9-40.el5 nfs-utils-lib-1.0.8-7.2.z2 Without heartbeat running, NFS works properly. When I add NFS as a resource into a group, it gets added but it does not seem to work as I cannot get to the share from other systems. I have tried following the site below, but I may have done something wrong since the share does not work. :( http://www.linux-ha.org/HaNFS Help. Thank you in advance. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon
Thanks. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Thursday, April 02, 2009 10:44 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon Jerome Yanga wrote: Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD daemon even if it is one of the resources. # service heartbeat stop Stopping High-Availability services: [ OK ] # service drbd status drbd driver loaded OK; device status: version: 8.2.7 (api:88/proto:86-88) GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by r...@nomen.esri.com, 2009-03-24 08:29:57 m:res csst ds p mounted fstype 0:r0 Unconfigured It stops your drbd resource (device). It just does not unload the module. That is the expected behaviour. Regards Dominik Running the command below stops the DRBD daemon. Service drbd stop Applications Installed: === drbd-8.2.7-3 heartbeat-2.99.2-6.1 pacemaker-1.0.2-11.1 CIB.xml: # crm configure show primitive fs0 ocf:heartbeat:Filesystem \ params fstype=ext3 directory=/data device=/dev/drbd0 primitive VIP ocf:heartbeat:IPaddr \ params ip=10.50.26.250 \ op monitor interval=5s timeout=5s primitive drbd0 ocf:heartbeat:drbd \ params drbd_resource=r0 \ op monitor interval=59s role=Master timeout=30s \ op monitor interval=60s role=Slave timeout=30s group DRBD_Group fs0 VIP \ meta collocated=true ordered=true migration-threshold=1 failure-timeout=10s resource-stickiness=10 ms ms-drbd0 drbd0 \ meta clone-max=2 notify=true globally-unique=false target-role=Started colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start Help. Regards, jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Re: Stopping the Heartbeat daemon does not stop the DRBD Daemon
Thanks. In my situation, DRBD is a resource in my cluster. Hence, it is managed by heartbeat. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Friday, April 03, 2009 1:50 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Re: Stopping the Heartbeat daemon does not stop the DRBD Daemon Joe Bill wrote: Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD daemon even if it is one of the resources. - Heartbeat and DRBD are 2 different products/packages - Like most services, DRBD doesn't need Heartbeat to run. You can set up and run DRBD volumes without Heartbeat installed, or any cluster supervisor. - The DRBD daemons provide the communication interface for each network volume and are therefor an integral part of the volume management. Without the DRBD daemons, you (manually) and Heartbeat (automagically) could not handle the DRBD volumes. Just to avoid confusion: There is no such thing as a DRBD daemon. DRBD is a kernel module. - If you look carefully at your startup, DRBD daemons start whether or not Heartbeat is started. That depends on your setup. Maybe in yours it does and it should. In others it does not and it should not. Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Stopping the Heartbeat daemon does not stop the DRBD Daemon
Stopping the Heartbeat daemon (service heartbeat stop) does not stop the DRBD daemon even if it is one of the resources. # service heartbeat stop Stopping High-Availability services: [ OK ] # service drbd status drbd driver loaded OK; device status: version: 8.2.7 (api:88/proto:86-88) GIT-hash: 61b7f4c2fc34fe3d2acf7be6bcc1fc2684708a7d build by r...@nomen.esri.com, 2009-03-24 08:29:57 m:res csst ds p mounted fstype 0:r0 Unconfigured Running the command below stops the DRBD daemon. Service drbd stop Applications Installed: === drbd-8.2.7-3 heartbeat-2.99.2-6.1 pacemaker-1.0.2-11.1 CIB.xml: # crm configure show primitive fs0 ocf:heartbeat:Filesystem \ params fstype=ext3 directory=/data device=/dev/drbd0 primitive VIP ocf:heartbeat:IPaddr \ params ip=10.50.26.250 \ op monitor interval=5s timeout=5s primitive drbd0 ocf:heartbeat:drbd \ params drbd_resource=r0 \ op monitor interval=59s role=Master timeout=30s \ op monitor interval=60s role=Slave timeout=30s group DRBD_Group fs0 VIP \ meta collocated=true ordered=true migration-threshold=1 failure-timeout=10s resource-stickiness=10 ms ms-drbd0 drbd0 \ meta clone-max=2 notify=true globally-unique=false target-role=Started colocation DRBD_Group-on-ms-drbd0 inf: DRBD_Group ms-drbd0:Master order ms-drbd0-before-DRBD_Group inf: ms-drbd0:promote DRBD_Group:start Help. Regards, jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] MailTo
I think I had this issue in the past as well. I just made sure that MAILCMD is assigned a mail app in .ocf-binaries. Try the command below. sed -i '/MAILCMD/s/=/=\/bin\/mail/g' /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of alexus Sent: Thursday, March 26, 2009 2:24 PM To: General Linux-HA mailing list Subject: [Linux-HA] MailTo when I go to http://www.linux-ha.org/MailTo I get: Page not found. I'm trying to figure out why my mailto isn't working... i tested email leaving that server no problem.. primitive class=ocf type=MailTo provider=heartbeat id=resource_mailto − instance_attributes id=resource_mailto_instance_attrs − attributes nvpair name=email id=a2fb7778-67f9-455e-bd93-72eadc937b7b value=linux-ha-ale...@xxx.xxx/ nvpair id=5360309d-cf65-4482-bc5e-6f3edc6c68a3 name=subject value=Linux-HA: MailTo/ /attributes /instance_attributes /primitive -- http://alexus.org/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Manual Resource Migration creates a Constraint
When I manually migrate a group resource, a constraint is automatically created. Is there a way to avoid this? Here is the automatically created constraint. rsc_location id=cli-standby-DRBD_Group rsc=DRBD_Group rule id=cli-standby-rule-DRBD_Group score=-INFINITY boolean-op=and expression id=cli-standby-expr-DRBD_Group attribute=#uname operation=eq value=nomen.esri.com type=string/ /rule /rsc_location Here is the command I use to manually migrate the group resource. crm resource migrate DRBD_Group This constraint prevents a node from taking over the group resource when Heartbeat is stopped on the other node. Help. Regards, jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
Thank you all. I have been happy with the functionality of the setup that you guys helped build. For reference, here are the versions that I am running. drbd-8.2.7-3 drbd-debuginfo-8.2.7-3 drbd-km-2.6.18_128.1.1.el5-8.2.7-3 heartbeat-2.99.2-6.1 heartbeat-common-2.99.2-6.1 heartbeat-debug-2.99.2-6.1 heartbeat-ldirectord-2.99.2-6.1 heartbeat-resources-2.99.2-6.1 kernel-2.6.18-128.1.1.el5 kernel-devel-2.6.18-128.1.1.el5 kernel-headers-2.6.18-128.1.1.el5 libheartbeat2-2.99.2-6.1 libpacemaker3-1.0.2-11.1 pacemaker-1.0.2-11.1 pacemaker-debug-1.0.2-11.1 pacemaker-mgmt-1.99.0-2.1 pacemaker-mgmt-client-1.99.0-2.1 pacemaker-mgmt-debug-1.99.0-2.1 I have tried DRBD 8.3.0 but the DRBD OCF Agent of Pacemaker 1.0.2-11.1 does not seem to work well with it. Regards, Jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Monday, March 09, 2009 12:05 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Hi Jerome Yanga wrote: Dominik, As usual, you are right on the money. I should have caught that myself. Thank you for catching that for me. What happened was that I used a different server to compile DRBD and I had assumed that Nomen and Rubic (my test nodes) were on the same kernel. Moreover, I had also combined Neil's suggestion to yours as he had mentioned that pacemaker-1.0.1 and drbd-8.2 works. My current issues are as follows: 1) I cannot migrate the resource fs0 from Nomen to Rubric. Running the command crm resource migrate fs0 just puts fs0 to offline state. This sounds like a config change. NOTE: I am planning to add fs0 into a Group that will be able to migrate between the two nodes (Nomen and Rubric). Help. Please provide the crm(live) syntax as I have tried the ones below and crm complains that the syntax is wrong. order ms-drbd0-before-fs0 mandatory: ms-drbd0:promote fs0:start colocation fs0-on-ms-drbd0 inf: fs0 ms-drbd0:Master You need 1.0.2 for that. 1.0.1 packages' crm shell had a bug there. 2) Is there a documentation for what resources, constraints and the like I can add into the cib.xml via crm(live)? Moreover, their syntax to add them via crm(live)? http://clusterlabs.org/wiki/Documentation --snip-- cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 epoch=153 num_updates=0 cib-last-written=Fri Mar 6 12:52:27 2009 dc-uuid=3a8b681c-a14b-4037-a8e6-2d4af2eff88e configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe/ nvpair id=cib-bootstrap-options-last-lrm-refresh name=last-lrm-refresh value=1236213117/ /cluster_property_set /crm_config nodes node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com type=normal/ node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com type=normal/ /nodes resources master id=ms-drbd0 meta_attributes id=ms-drbd0-meta_attributes nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max value=2/ nvpair id=ms-drbd0-meta_attributes-notify name=notify value=true/ nvpair id=ms-drbd0-meta_attributes-globally-unique name=globally-unique value=false/ nvpair id=ms-drbd0-meta_attributes-target-role name=target-role value=Started/ /meta_attributes primitive class=ocf id=drbd0 provider=heartbeat type=drbd instance_attributes id=drbd0-instance_attributes nvpair id=drbd0-instance_attributes-drbd_resource name=drbd_resource value=r0/ /instance_attributes operations id=drbd0-ops op id=drbd0-monitor-59s interval=59s name=monitor role=Master timeout=30s/ op id=drbd0-monitor-60s interval=60s name=monitor role=Slave timeout=30s/ /operations /primitive /master primitive class=ocf id=VIP provider=heartbeat type=IPaddr instance_attributes id=VIP-instance_attributes nvpair id=VIP-instance_attributes-ip name=ip value=10.50.26.250/ /instance_attributes operations id=VIP-ops op id=VIP-monitor-5s interval=5s name=monitor timeout=5s/ /operations /primitive primitive class=ocf id=fs0 provider=heartbeat type=Filesystem instance_attributes id=fs0-instance_attributes nvpair id=fs0-instance_attributes-fstype name=fstype value=ext3/ nvpair id=fs0-instance_attributes-directory name=directory value=/data/ nvpair id=fs0-instance_attributes-device name=device value=/dev/drbd0/ /instance_attributes /primitive /resources constraints/ You don't have any constraints, so
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
Thanks, Neil. However, the reason why I wanted DRBD to start via Pacemaker is because I want Pacemaker to manage the DRBD process and be able to migrate it between the nodes. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian R. Hellman Sent: Wednesday, March 04, 2009 4:52 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker DRBD needs to be running prior to starting pacemaker, it should be in secondary/secondary mode. When you stop the service you are unloading the DRBD module, hence it can not start. Jerome Yanga wrote: Hi Neil! Yes. DRBD works outside of Pacemaker. When I do a service drbd start on each node, drbd runs properly and are both Secondary. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin Sent: Wednesday, March 04, 2009 4:00 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Does drbd work outside of pacemaker? I suspect perhaps not from these lines in your log: Mar 4 14:27:59 nomen modprobe: FATAL: Module drbd not found. Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) Could not stat(/proc/drbd): No such file or directory do you need to load the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' terminated with exit code 20 drbdadm attach r0: exited with code 20 Mar 4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after start. Try starting drbd by hand with pacemaker turned off; it should come up on both nodes, with both nodes as secondary. If it doesn't they you have to fix drbd first before trying to add pacemaker to the mix. Neil Jerome Yanga wrote: Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 Below is my configuration. Installed applications: === kernel-2.6.18-128.el5 drbd-8.3.0-3 heartbeat-2.99.2-6.1 pacemaker-1.0.1-3.1 drbd.conf: == global { usage-count no; } resource r0 { protocol C; handlers { pri-on-incon-degr echo o /proc/sysrq-trigger ; halt -f; pri-lost-after-sb echo o /proc/sysrq-trigger ; halt -f; local-io-error echo o /proc/sysrq-trigger ; halt -f; outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5; pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root; } startup { wfc-timeout 0; } disk { on-io-error pass_on; } net { max-buffers 2048; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on nomen.esri.com { device /dev/drbd0; disk /dev/sda5; address192.168.0.1:7789; meta-disk internal; } on rubric.esri.com { device/dev/drbd0; disk /dev/sda5; address 192.168.0.2:7789; meta-disk internal; } } Cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=a5 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 cib-last-written=Wed Mar 4 14:27:59 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce830 2abf145a02891ec41e5a492efbe8efe/ /cluster_property_set /crm_config nodes node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com type=normal/ node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com type=normal/ /nodes resources master id=ms-drbd0 meta_attributes id=ms-drbd0-meta_attributes nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max value=2/ nvpair id=ms-drbd0-meta_attributes-notify name=notify value=true/ nvpair id=ms-drbd0-meta_attributes-globally-unique name=globally-unique value=false / nvpair name=target-role id=ms-drbd0-meta_attributes-target-role value=Started/ /meta_attributes primitive class=ocf id=drbd0 provider=heartbeat type=drbd instance_attributes id=drbd0-instance_attributes nvpair id=drbd0-instance_attributes-drbd_resource name=drbd_resource value=r0/ /instance_attributes operations id
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
I apologize, Brian. The gratitude should have been sent to you. Thanks, Brian. :) Regards, jerome -Original Message- From: Jerome Yanga Sent: Friday, March 06, 2009 9:35 AM To: General Linux-HA mailing list Subject: RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Thanks, Neil. However, the reason why I wanted DRBD to start via Pacemaker is because I want Pacemaker to manage the DRBD process and be able to migrate it between the nodes. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Brian R. Hellman Sent: Wednesday, March 04, 2009 4:52 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker DRBD needs to be running prior to starting pacemaker, it should be in secondary/secondary mode. When you stop the service you are unloading the DRBD module, hence it can not start. Jerome Yanga wrote: Hi Neil! Yes. DRBD works outside of Pacemaker. When I do a service drbd start on each node, drbd runs properly and are both Secondary. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin Sent: Wednesday, March 04, 2009 4:00 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Does drbd work outside of pacemaker? I suspect perhaps not from these lines in your log: Mar 4 14:27:59 nomen modprobe: FATAL: Module drbd not found. Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) Could not stat(/proc/drbd): No such file or directory do you need to load the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' terminated with exit code 20 drbdadm attach r0: exited with code 20 Mar 4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after start. Try starting drbd by hand with pacemaker turned off; it should come up on both nodes, with both nodes as secondary. If it doesn't they you have to fix drbd first before trying to add pacemaker to the mix. Neil Jerome Yanga wrote: Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 Below is my configuration. Installed applications: === kernel-2.6.18-128.el5 drbd-8.3.0-3 heartbeat-2.99.2-6.1 pacemaker-1.0.1-3.1 drbd.conf: == global { usage-count no; } resource r0 { protocol C; handlers { pri-on-incon-degr echo o /proc/sysrq-trigger ; halt -f; pri-lost-after-sb echo o /proc/sysrq-trigger ; halt -f; local-io-error echo o /proc/sysrq-trigger ; halt -f; outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5; pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root; } startup { wfc-timeout 0; } disk { on-io-error pass_on; } net { max-buffers 2048; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on nomen.esri.com { device /dev/drbd0; disk /dev/sda5; address192.168.0.1:7789; meta-disk internal; } on rubric.esri.com { device/dev/drbd0; disk /dev/sda5; address 192.168.0.2:7789; meta-disk internal; } } Cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=a5 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 cib-last-written=Wed Mar 4 14:27:59 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce830 2abf145a02891ec41e5a492efbe8efe/ /cluster_property_set /crm_config nodes node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com type=normal/ node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com type=normal/ /nodes resources master id=ms-drbd0 meta_attributes id=ms-drbd0-meta_attributes nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max value=2/ nvpair id=ms-drbd0-meta_attributes-notify name=notify value=true/ nvpair id=ms-drbd0-meta_attributes-globally-unique name=globally-unique value=false / nvpair name=target-role id=ms-drbd0-meta_attributes-target-role value=Started
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
Neil, Unfortunately, I was not able to receive your scripts as it was filter by the mail servers. However, I have tried your suggestion that Pacemaker 1.0.1 and DRBD 8.2 works. With Pacemaker 1.0.1 and DRBD 8.2, I was able to add the DRBD resource into a master-slave resource in Pacemaker. I just couldn't get the resource to migrate. This is probably due to configuration settings that I am missing. I will send more info on my next post. By the way, can you try resending the scripts the a tar file? Thank you in advance. Regards, Jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin Sent: Wednesday, March 04, 2009 10:27 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker OK. I notice you're running drbd 8.3. The scripts distributed with pacemaker 1.0.1 only worked with 8.2 (I'm not sure about 1.0.2). I made a few tiny changes to the drbd script so it would run with 8.3. I've attached the changed script. I also added more logging so you can see exactly what is going on in the drbd OCF script at debug level. You probably want to change your loggging to debug to get all the output while trying to figure this out. However, the errors I got from the script were very different from yours: I was getting errors from drbdadm, not failures to load the driver. The other thing I did was run the OCF scripts by hand (you have to set a bunch of env variables). I can't find the script I used to test drbd, but I've attached one I used for mysql; you should be able to adapt it to your use. As always, remember bash -x is your friend. Neil Jerome Yanga wrote: Hi Neil! Yes. DRBD works outside of Pacemaker. When I do a service drbd start on each node, drbd runs properly and are both Secondary. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
]: info: unpack_graph: Unpacked transition 35: 1 actions in 1 synapses Mar 6 12:56:14 nomen crmd: [14603]: info: do_te_invoke: Processing graph 35 (ref=pe_calc-dc-1236372974-109) derived from /var/lib/heartbeat/pengine/pe-input-75.bz2 Mar 6 12:56:14 nomen crmd: [14603]: info: send_rsc_command: Initiating action 41: start fs0_start_0 on nomen.esri.com Mar 6 12:56:14 nomen crmd: [14603]: info: do_lrm_rsc_op: Performing key=41:35:0:44aada21-7997-4a4f-ba9a-4ae8a2629a58 op=fs0_start_0 ) Mar 6 12:56:14 nomen lrmd: [14509]: info: rsc:fs0: start Mar 6 12:56:14 nomen Filesystem[1681]: INFO: Running start for /dev/drbd0 on /data Mar 6 12:56:14 nomen cib: [1678]: info: write_cib_contents: Wrote version 0.155.0 of the CIB to disk (digest: 0fd876c0a5f2db21a9aa66b3f997194f) Mar 6 12:56:14 nomen cib: [1678]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Mar 6 12:56:14 nomen cib: [14508]: info: Managed write_cib_contents process 1678 exited with return code 0. Mar 6 12:56:14 nomen kernel: kjournald starting. Commit interval 5 seconds Mar 6 12:56:14 nomen kernel: EXT3 FS on drbd0, internal journal Mar 6 12:56:14 nomen kernel: EXT3-fs: mounted filesystem with ordered data mode. Mar 6 12:56:14 nomen lrmd: [14509]: info: Managed fs0:start process 1681 exited with return code 0. Mar 6 12:56:14 nomen lrmd: [14509]: info: Resource Agent output: [] Mar 6 12:56:14 nomen crmd: [14603]: info: process_lrm_event: LRM operation fs0_start_0 (call=47, rc=0, cib-update=115, confirmed=true) complete ok Mar 6 12:56:15 nomen cib: [14508]: info: cib_process_request: Operation complete: op cib_modify for section 'all' (origin=local/crmd/115): ok (rc=0) Mar 6 12:56:15 nomen crmd: [14603]: info: match_graph_event: Action fs0_start_0 (41) confirmed on nomen.esri.com (rc=0) Mar 6 12:56:15 nomen crmd: [14603]: info: run_graph: Mar 6 12:56:15 nomen crmd: [14603]: notice: run_graph: Transition 35 (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/heartbeat/pengine/pe-input-75.bz2): Complete Mar 6 12:56:15 nomen crmd: [14603]: info: te_graph_trigger: Transition 35 is now complete Mar 6 12:56:15 nomen crmd: [14603]: info: notify_crmd: Transition 35 status: done - null Mar 6 12:56:15 nomen crmd: [14603]: info: do_state_transition: State transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Mar 6 12:56:15 nomen haclient: on_event: from message queue: evt:cib_changed Mar 6 12:56:15 nomen mgmtd: [14526]: info: CIB query: cib Mar 6 12:56:15 nomen heartbeat: [14466]: WARN: G_CH_dispatch_int: Dispatch function for read child took too long to execute: 70 ms ( 50 ms) (GSource: 0x94add68) Mar 6 12:56:17 nomen lrmd: [14509]: info: Resource Agent output: [] Regards, jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Wednesday, March 04, 2009 10:54 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Hi Jerome Yanga wrote: Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 Below is my configuration. Installed applications: === kernel-2.6.18-128.el5 copy that drbd-8.3.0-3 heartbeat-2.99.2-6.1 pacemaker-1.0.1-3.1 drbd.conf: == global { usage-count no; } resource r0 { protocol C; handlers { pri-on-incon-degr echo o /proc/sysrq-trigger ; halt -f; pri-lost-after-sb echo o /proc/sysrq-trigger ; halt -f; local-io-error echo o /proc/sysrq-trigger ; halt -f; outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5; pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root; } startup { wfc-timeout 0; } disk { on-io-error pass_on; } net { max-buffers 2048; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on nomen.esri.com { device /dev/drbd0; disk /dev/sda5; address192.168.0.1:7789; meta-disk internal; } on rubric.esri.com { device/dev/drbd0; disk /dev/sda5; address 192.168.0.2:7789; meta-disk internal; } } Cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=a5 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 cib-last-written=Wed
[Linux-HA] Having issues with getting DRBD to work with Pacemaker
Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 Below is my configuration. Installed applications: === kernel-2.6.18-128.el5 drbd-8.3.0-3 heartbeat-2.99.2-6.1 pacemaker-1.0.1-3.1 drbd.conf: == global { usage-count no; } resource r0 { protocol C; handlers { pri-on-incon-degr echo o /proc/sysrq-trigger ; halt -f; pri-lost-after-sb echo o /proc/sysrq-trigger ; halt -f; local-io-error echo o /proc/sysrq-trigger ; halt -f; outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5; pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root; } startup { wfc-timeout 0; } disk { on-io-error pass_on; } net { max-buffers 2048; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on nomen.esri.com { device /dev/drbd0; disk /dev/sda5; address192.168.0.1:7789; meta-disk internal; } on rubric.esri.com { device/dev/drbd0; disk /dev/sda5; address 192.168.0.2:7789; meta-disk internal; } } Cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=a5 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 cib-last-written=Wed Mar 4 14:27:59 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce830 2abf145a02891ec41e5a492efbe8efe/ /cluster_property_set /crm_config nodes node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com type=normal/ node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com type=normal/ /nodes resources master id=ms-drbd0 meta_attributes id=ms-drbd0-meta_attributes nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max value=2/ nvpair id=ms-drbd0-meta_attributes-notify name=notify value=true/ nvpair id=ms-drbd0-meta_attributes-globally-unique name=globally-unique value=false / nvpair name=target-role id=ms-drbd0-meta_attributes-target-role value=Started/ /meta_attributes primitive class=ocf id=drbd0 provider=heartbeat type=drbd instance_attributes id=drbd0-instance_attributes nvpair id=drbd0-instance_attributes-drbd_resource name=drbd_resource value=r0/ /instance_attributes operations id=drbd0-ops op id=drbd0-monitor-59s interval=59s name=monitor role=Master timeout=30s/ op id=drbd0-monitor-60s interval=60s name=monitor role=Slave timeout=30s/ /operations /primitive /master /resources constraints/ /configuration /cib /var/log/messages: == Mar 4 14:27:58 nomen crm_resource: [30167]: info: Invoked: crm_resource --meta -r ms-drbd0 -p target-role -v Started Mar 4 14:27:58 nomen cib: [29899]: info: cib_process_xpath: Processing cib_query op for //cib/configuration/resources//*...@id=ms-drbd0]//meta_attributes//nvpa...@name=target-role] (/cib/configuration/resources/master/meta_attributes/nvpair[4]) Mar 4 14:27:59 nomen crmd: [29903]: info: do_lrm_rsc_op: Performing key=5:5:0:d4b86e31-ca4a-4033-8437-6486622eb19f op=drbd0:0_start_0 ) Mar 4 14:27:59 nomen haclient: on_event:evt:cib_changed Mar 4 14:27:59 nomen lrmd: [29900]: info: rsc:drbd0:0: start Mar 4 14:27:59 nomen cib: [30168]: info: write_cib_contents: Wrote version 0.56.0 of the CIB to disk (digest: 2365d9802f1b9c55e0ed87b8ebda5db3) Mar 4 14:27:59 nomen cib: [30168]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Mar 4 14:27:59 nomen cib: [29899]: info: Managed write_cib_contents process 30168 exited with return code 0. Mar 4 14:27:59 nomen modprobe: FATAL: Module drbd not found. Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) Mar 4 14:27:59 nomen mgmtd: [29904]: info: CIB query: cib Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) Could not stat(/proc/drbd): No such file or directory do you need to load the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' terminated with exit code 20 drbdadm attach r0: exited with code 20 Mar 4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after start. Mar 4 14:27:59 nomen lrmd: [29900]: WARN: Managed drbd0:0:start process
RE: [Linux-HA] Having issues with getting DRBD to work with Pacemaker
Hi Neil! Yes. DRBD works outside of Pacemaker. When I do a service drbd start on each node, drbd runs properly and are both Secondary. jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Neil Katin Sent: Wednesday, March 04, 2009 4:00 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Having issues with getting DRBD to work with Pacemaker Does drbd work outside of pacemaker? I suspect perhaps not from these lines in your log: Mar 4 14:27:59 nomen modprobe: FATAL: Module drbd not found. Mar 4 14:27:59 nomen lrmd: [29900]: info: RA output: (drbd0:0:start:stdout) Could not stat(/proc/drbd): No such file or directory do you need to load the module? try: modprobe drbd Command 'drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device --on-io-error=pass_on' terminated with exit code 20 drbdadm attach r0: exited with code 20 Mar 4 14:27:59 nomen drbd[30169]: ERROR: r0 start: not in Secondary mode after start. Try starting drbd by hand with pacemaker turned off; it should come up on both nodes, with both nodes as secondary. If it doesn't they you have to fix drbd first before trying to add pacemaker to the mix. Neil Jerome Yanga wrote: Hi! I am having issues with getting DRBD to work with Pacemaker. I can get Pacemaker and DRBD run individually but not DRBD managed by Pacemaker. I tried following the instruction in the site below but the resources will not go online. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 Below is my configuration. Installed applications: === kernel-2.6.18-128.el5 drbd-8.3.0-3 heartbeat-2.99.2-6.1 pacemaker-1.0.1-3.1 drbd.conf: == global { usage-count no; } resource r0 { protocol C; handlers { pri-on-incon-degr echo o /proc/sysrq-trigger ; halt -f; pri-lost-after-sb echo o /proc/sysrq-trigger ; halt -f; local-io-error echo o /proc/sysrq-trigger ; halt -f; outdate-peer /usr/lib/heartbeat/drbd-peer-outdater -t 5; pri-lost echo pri-lost. Have a look at the log files. | mail -s 'DRBD Alert' root; out-of-sync /usr/lib/drbd/notify-out-of-sync.sh root; } startup { wfc-timeout 0; } disk { on-io-error pass_on; } net { max-buffers 2048; after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } syncer { rate 100M; al-extents 257; } on nomen.esri.com { device /dev/drbd0; disk /dev/sda5; address192.168.0.1:7789; meta-disk internal; } on rubric.esri.com { device/dev/drbd0; disk /dev/sda5; address 192.168.0.2:7789; meta-disk internal; } } Cib.xml: cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=a5 e95310-f27d-418e-9cb9-42e50310f702 epoch=56 num_updates=0 cib-last-written=Wed Mar 4 14:27:59 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce830 2abf145a02891ec41e5a492efbe8efe/ /cluster_property_set /crm_config nodes node id=3a8b681c-a14b-4037-a8e6-2d4af2eff88e uname=nomen.esri.com type=normal/ node id=a5e95310-f27d-418e-9cb9-42e50310f702 uname=rubric.esri.com type=normal/ /nodes resources master id=ms-drbd0 meta_attributes id=ms-drbd0-meta_attributes nvpair id=ms-drbd0-meta_attributes-clone-max name=clone-max value=2/ nvpair id=ms-drbd0-meta_attributes-notify name=notify value=true/ nvpair id=ms-drbd0-meta_attributes-globally-unique name=globally-unique value=false / nvpair name=target-role id=ms-drbd0-meta_attributes-target-role value=Started/ /meta_attributes primitive class=ocf id=drbd0 provider=heartbeat type=drbd instance_attributes id=drbd0-instance_attributes nvpair id=drbd0-instance_attributes-drbd_resource name=drbd_resource value=r0/ /instance_attributes operations id=drbd0-ops op id=drbd0-monitor-59s interval=59s name=monitor role=Master timeout=30s/ op id=drbd0-monitor-60s interval=60s name=monitor role=Slave timeout=30s/ /operations /primitive /master /resources constraints/ /configuration /cib /var/log/messages: == Mar 4 14:27:58 nomen crm_resource: [30167]: info: Invoked: crm_resource --meta -r ms-drbd0 -p target-role -v Started Mar 4 14:27:58 nomen cib: [29899]: info: cib_process_xpath: Processing cib_query op for //cib/configuration/resources//*...@id=ms-drbd0]//meta_attributes//nvpa...@name=target-role] (/cib/configuration/resources/master
RE: [Linux-HA] Call cib_create failed (-47): Update does not conform to the configured schema/DTD
I found the site as soon as I sent it out. http://clusterlabs.org/wiki/DRBD_HowTo_1.0 'Hope this helps others. :) Regards, Jerome From: Jerome Yanga Sent: Wednesday, February 25, 2009 2:40 PM To: General Linux-HA mailing list Subject: [Linux-HA] Call cib_create failed (-47): Update does not conform to the configured schema/DTD I have been trying to add DRBD on my Pacemaker config using crm(live) but I keep getting this message below: Call cib_create failed (-47): Update does not conform to the configured schema/DTD Can someone show me the syntax for crm(live) to add my DRBD resource below? master_slave id=r0_data meta_attributes id=r0_data_meta_attrs attributes nvpair id=r0_data_metaattr_target_role name=target_role value=started/ nvpair id=r0_data_metaattr_clone_max name=clone_max value=2/ nvpair id=r0_data_metaattr_clone_node_max name=clone_node_max value=1/ nvpair id=r0_data_metaattr_master_max name=master_max value=1/ nvpair id=r0_data_metaattr_master_node_max name=master_node_max value=1/ nvpair id=r0_data_metaattr_notify name=notify value=true/ nvpair id=r0_data_metaattr_globally_unique name=globally_unique value=false/ /attributes /meta_attributes primitive id=drbd_resource class=ocf type=drbd provider=heartbeat instance_attributes id=drbd_resource_instance_attrs attributes nvpair id=cd3b3992-4492-478d-ad27-eaaf0698ec53 name=drbd_resource value=drbd0/ /attributes /instance_attributes meta_attributes id=drbd_resource:0_meta_attrs attributes nvpair id=drbd_resource:1_metaattr_target_role name=target_role value=stopped/ /attributes /meta_attributes operations op id=c2f4dd35-db6c-4d20-ab90-239aa511726f name=monitor interval=4s timeout=5s disabled=false role=Master start_delay=0/ op id=046449e7-247b-42a3-a0bf-d7b9bc0fe010 name=monitor interval=5s timeout=5s disabled=false role=Slave start_delay=0/ /operations /primitive /master_slave Help. Regards, Jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Call cib_create failed (-47): Update does not conform to the configured schema/DTD
I have been trying to add DRBD on my Pacemaker config using crm(live) but I keep getting this message below: Call cib_create failed (-47): Update does not conform to the configured schema/DTD Can someone show me the syntax for crm(live) to add my DRBD resource below? master_slave id=r0_data meta_attributes id=r0_data_meta_attrs attributes nvpair id=r0_data_metaattr_target_role name=target_role value=started/ nvpair id=r0_data_metaattr_clone_max name=clone_max value=2/ nvpair id=r0_data_metaattr_clone_node_max name=clone_node_max value=1/ nvpair id=r0_data_metaattr_master_max name=master_max value=1/ nvpair id=r0_data_metaattr_master_node_max name=master_node_max value=1/ nvpair id=r0_data_metaattr_notify name=notify value=true/ nvpair id=r0_data_metaattr_globally_unique name=globally_unique value=false/ /attributes /meta_attributes primitive id=drbd_resource class=ocf type=drbd provider=heartbeat instance_attributes id=drbd_resource_instance_attrs attributes nvpair id=cd3b3992-4492-478d-ad27-eaaf0698ec53 name=drbd_resource value=drbd0/ /attributes /instance_attributes meta_attributes id=drbd_resource:0_meta_attrs attributes nvpair id=drbd_resource:1_metaattr_target_role name=target_role value=stopped/ /attributes /meta_attributes operations op id=c2f4dd35-db6c-4d20-ab90-239aa511726f name=monitor interval=4s timeout=5s disabled=false role=Master start_delay=0/ op id=046449e7-247b-42a3-a0bf-d7b9bc0fe010 name=monitor interval=5s timeout=5s disabled=false role=Slave start_delay=0/ /operations /primitive /master_slave Help. Regards, Jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Failover not working as I expected
I have checked both services on each server and both of them are off. [r...@rubric ~]# chkconfig --list dirsrv dirsrv 0:off 1:off 2:off 3:off 4:off 5:off 6:off [r...@rubric ~]# chkconfig --list dirsrv-admin dirsrv-admin0:off 1:off 2:off 3:off 4:off 5:off 6:off [r...@nomen ~]# chkconfig --list dirsrv dirsrv 0:off 1:off 2:off 3:off 4:off 5:off 6:off [r...@nomen ~]# chkconfig --list dirsrv-admin dirsrv-admin0:off 1:off 2:off 3:off 4:off 5:off 6:off The way I determine if the service bounces when a node rejoins the cluster is by running a script called statusfds.sh. This script contains the following: #!/bin/bash service dirsrv status service dirsrv-admin status Then I run the following command to monitor the services. watch -n 1 statusfds.sh Moreover, even hb_gui shows that the services are bounced/restarted when a node joins the cluster. The status of the resources changes to failed for a second and changes back to running on. Regards, jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Andrew Beekhof Sent: Thursday, January 29, 2009 3:51 AM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Failover not working as I expected On Tue, Jan 27, 2009 at 22:04, Jerome Yanga jya...@esri.com wrote: Dominik, Here is the status of the two concerns I needed help on. 01) When a node comes back up after a restart of heartbeat, resources gets bounced when it rejoins the cluster. STATUS: The resources still gets bounced when a node joins the cluster even if I had deleted all the constraints. You might want to check if the service is being started automatically by the OS when it boots. The cluster will notice this and the recovery can make it look the resource is merely bouncing. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] Failover not working as I expected
Good evening to you, Dominik. :) I apologize for being persistent. I can work around the situations that I have encountered via creating scripts. However, I just thought that there may be something in the configuration that I can tweak to make it work. You have been very helpful and that is greatly appreciated. In fact, you have resolved all the situations I encountered, except the one that you had asked me to create a bug report on which I would so that product will be better. Besides, you will probably hate this project that I am working on to fall into MSCS (Microsoft Cluster Service) as much as I will. Oooh...just the thought that the project will resort to a Microsoft solution makes me feel like I am losing my freedom (I certainly do not want this to happen and will try hard for this not to happen). I have submitted this to Bugzilla as you have recommended. It is registered as Bug 2047. Thank you for your support. Regards, jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Wednesday, January 28, 2009 11:19 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Failover not working as I expected Good morning Jerome we should make this a daily thing, shouldn't we? Jerome Yanga wrote: Dominik, I apologize for leaving resource-stickiness out. I had it there previously but due to the trial and errors I had performed on the crm shell, I had forgotten to re-add it. Nevertheless, adding it to my cib.xml file does not seem to work. Here is the chain of events. This happens on either Nomen or Rubric. 01) Nomen (one of the two nodes) owns the group resource, called Directory_Server. In the meantime, Rubric (the other node) is just there waiting for the resources to come to him. :) 02) I stop heartbeat on Nomen and the Directory_Server resource group fails over to Rubric. 03) Nomen's status changes from running(dc) to stopped 04) After waiting for step #3 to finish its transition, I start heartbeat back up in Nomen. 05) Nomen's status changes from stopped to running-standby to running. 06) Rubric retains all the resources. However, all the resources on Rubric bounces/restarts when Nomen's status changes from running-standby to running. With the configuration you posted below, this should not happen. The configuration looks good for what you want. If you're sure that is what you do and get, please file a bug about that and include a hb_report. http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker Is there a way to prevent the resources in Rubric to bounce/restart when Nomen rejoins the cluster? Help. On the other hand, you pointed me to the right direction regarding the MailTo OCFAgent. This is how the variable looked like in .ocf-binaries when it was not working. rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries : ${MAILCMD:=} I assigned the exact path of the mail command to the variable. Now, I get emailed every time a failover happens. Wooot! Wooot! :) rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries : ${MAILCMD:=/bin/mail} Good. I think this was on the lists earlier. Apparently a packaging issue. Regards Dominik Thanks. Below is my current cib.xml file. cib admin_epoch=0 validate-with=pacemaker-1.0 crm_feature_set=3.0 have-quorum=1 dc-uuid=27f54ec3-b626-4b4f-b8a6-4ed0b768513c epoch=102 num_updates=0 cib-last-written=Wed Jan 28 08:32:39 2009 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe/ /cluster_property_set /crm_config nodes node id=5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e uname=nomen.esri.com type=normal instance_attributes id=nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e nvpair id=standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e name=standby value=off/ /instance_attributes /node node id=27f54ec3-b626-4b4f-b8a6-4ed0b768513c uname=rubric.esri.com type=normal instance_attributes id=nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c nvpair id=standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c name=standby value=off/ /instance_attributes /node /nodes resources group id=Directory_Server meta_attributes id=Directory_Server-meta_attributes nvpair id=Directory_Server-meta_attributes-collocated name=collocated value=true/ nvpair id=Directory_Server-meta_attributes-ordered name=ordered value=true/ nvpair id=Directory_Server-meta_attributes-migration-threshold name=migration-threshold value=1/ nvpair id=Directory_Server-meta_attributes-failure-timeout name=failure-timeout value=10s
RE: [Linux-HA] Failover not working as I expected
Dominik, Thank you much. Adding resource-stickiness and getting rid of the constraint helped a lot. The resources does not go back to Nomen anymore when it's heartbeat is started again (resources stays with Rubric). However, the resources still gets bounced once Nomen joins the cluster. Is there any way to keep the resources from bouncing when Nomen rejoins the cluster? I have also observed another issue. As you have seen in my cib.xml, I have created a group called Directory_Server. In this group, there are three resources, namely: VIP, ECAS and FDS_Admin. If I manually turn off any of these resources, I would like the group resource, Directory_Server, to failover to the other node. Is there a configuration that will do this? Currently, if one of three resources goes down it stays down and the rest continues running. All three resources will need to be up and running for our applications to work properly. To answer your question... Also due to your rsc_location. The resource is where you configured it (on nomen), so why move it around? I added rsc_location in the configuration as I was trying to follow the sample ActivePassive configuration. http://linux-ha.org/GettingStartedV2/OneIPAddress I have been moving resources around because I am testing HA thoroughly before I implement it in our production environment. Regards, Jerome -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Thursday, January 15, 2009 11:16 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Failover not working as I expected Hi Jerome The name of the servers are as follows: Nomen and Rubric. Let us start when Nomen owns all resources and its status states running(dc). When I stop heartbeat on Nomen, Rubric takes over all the resources and its status turns into running(dc). This is good as this is what I had hoped that it will do. When I start heartbeat back on Nomen, it takes all the resource away from Rubric. However, it leaves Rubric in running(dc) status and Nomen's status just states running. There are two issues here that I see. 1) I do not want Nomen to take the resources as this means that the resources will be bounced. This happens because of your rsc_location constraint. You normally want your resource to be on nomen, so if the cluster can, it will run it there. rsc_location id=fdstest rsc=Directory_Server rule id=prefered_fdstest score=100 boolean_op=or expression attribute=#uname id=9e5698e0-8b07-43aa-b852-398fbe6bb909 operation=eq value=nomen.esri.com/ /rule /rsc_location If you want the resource to stick to its current location even when the preferred node comes back, look into the meta-attribute resource-stickiness. Read http://www.linux-ha.org/ScoreCalculation 2) I would like to have the Quorum or running(dc) where the resources are. You can't move the dc role manually. And you do not have to bother which machine is the dc. It is totally fine having resources on a node which is not the dc. The current dc stays dc until it is shutdown or separated from the cluster in some manner. To continue, when I stop heartbeat on Rubric, the running(dc) status goes over to Nomen. I then start heartbeat in Rubric and all resources as well as the running(dc) stays with Nomen. Moreover, the resources are not bounced at all. Also due to your rsc_location. The resource is where you configured it (on nomen), so why move it around? Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] How to Monitor Services
I have tried reading and interpreting the link below but could not get the Resource Monitoring to work. http://linux-ha.org/ClusterInformationBase/Actions I have installed heartbeat-2.1.3-1. The following is in my /var/lib/heartbeat/crm/cib.xml file. primitive id=ecas class=lsb type=dirsrv provider=heartbeat meta_attributes id=ecas_meta_attrs attributes nvpair id=ecas_metaattr_is_managed name=is_managed value=true/ /attributes /meta_attributes operations op id=3cb67e8a-0c36-43be-a758-be32ff1a377d name=stop timeout=3s start_delay=0s disabled=false role=Started/ op id=0aa741d5-3540-4f0a-a998-b842e346e574 name=start timeout=5s start_delay=0s disabled=false role=Started/ op id=df305ad8-92c7-4c95-bb8f-5646f9049a6f name=monitor interval=5s timeout=3s start_delay=0s disabled=false role=Master on_fail=restart prereq=nothing/ op id=e4e428cf-b1c7-40af-aa3f-c8f25cded958 name=monitor interval=10s timeout=3s start_delay=0s disabled=false role=Slave on_fail=restart prereq=nothing/ op id=977d0884-b419-4494-ab4c-d1c130e8dee4 name=monitor interval=6s timeout=3s role=Started on_fail=restart start_delay=0s disabled=false prereq=nothing/ op id=9a46191d-cf6e-4243-bd25-6f9ea44116ca name=monitor interval=7s timeout=3s role=Stopped on_fail=restart start_delay=0s disabled=false prereq=nothing/ /operations /primitive I only have a single node as I just wanted to test if Heartbeat will start the service automatically if I shutdown the dirsrv service manually. Here is my /etc/ha.d/ha.cf. # cat /etc/ha.d/ha.cf use_logd on bcast eth0 node server1 crm yes Help. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] How to Monitor Services
I have tried reading and interpreting the link below but could not get the Resource Monitoring to work. http://linux-ha.org/ClusterInformationBase/Actions I have installed heartbeat-2.1.3-1. The following is in my /var/lib/heartbeat/crm/cib.xml file. primitive id=ecas class=lsb type=dirsrv provider=heartbeat meta_attributes id=ecas_meta_attrs attributes nvpair id=ecas_metaattr_is_managed name=is_managed value=true/ /attributes /meta_attributes operations op id=3cb67e8a-0c36-43be-a758-be32ff1a377d name=stop timeout=3s start_delay=0s disabled=false role=Started/ op id=0aa741d5-3540-4f0a-a998-b842e346e574 name=start timeout=5s start_delay=0s disabled=false role=Started/ op id=df305ad8-92c7-4c95-bb8f-5646f9049a6f name=monitor interval=5s timeout=3s start_delay=0s disabled=false role=Master on_fail=restart prereq=nothing/ op id=e4e428cf-b1c7-40af-aa3f-c8f25cded958 name=monitor interval=10s timeout=3s start_delay=0s disabled=false role=Slave on_fail=restart prereq=nothing/ op id=977d0884-b419-4494-ab4c-d1c130e8dee4 name=monitor interval=6s timeout=3s role=Started on_fail=restart start_delay=0s disabled=false prereq=nothing/ op id=9a46191d-cf6e-4243-bd25-6f9ea44116ca name=monitor interval=7s timeout=3s role=Stopped on_fail=restart start_delay=0s disabled=false prereq=nothing/ /operations /primitive I only have a single node as I just wanted to test if Heartbeat will start the service automatically if I shutdown the dirsrv service manually. Here is my /etc/ha.d/ha.cf. # cat /etc/ha.d/ha.cf use_logd on bcast eth0 node server1 crm yes Help. jerome ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems