Good evening to you, Dominik. :) I apologize for being persistent. I can work around the situations that I have encountered via creating scripts. However, I just thought that there may be something in the configuration that I can tweak to make it work. You have been very helpful and that is greatly appreciated. In fact, you have resolved all the situations I encountered, except the one that you had asked me to create a bug report on which I would so that product will be better. Besides, you will probably hate this project that I am working on to fall into MSCS (Microsoft Cluster Service) as much as I will. Oooh...just the thought that the project will resort to a Microsoft solution makes me feel like I am losing my freedom (I certainly do not want this to happen and will try hard for this not to happen).
I have submitted this to Bugzilla as you have recommended. It is registered as Bug 2047. Thank you for your support. Regards, jerome -----Original Message----- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Dominik Klein Sent: Wednesday, January 28, 2009 11:19 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] Failover not working as I expected Good morning Jerome we should make this a daily thing, shouldn't we? Jerome Yanga wrote: > Dominik, > > I apologize for leaving resource-stickiness out. I had it there previously > but due to the trial and errors I had performed on the crm shell, I had > forgotten to re-add it. Nevertheless, adding it to my cib.xml file does not > seem to work. > > Here is the chain of events. This happens on either Nomen or Rubric. > > 01) Nomen (one of the two nodes) owns the group resource, called > Directory_Server. In the meantime, Rubric (the other node) is just there > waiting for the resources to come to him. :) > 02) I stop heartbeat on Nomen and the Directory_Server resource group fails > over to Rubric. > 03) Nomen's status changes from "running(dc)" to "stopped" > 04) After waiting for step #3 to finish its transition, I start heartbeat > back up in Nomen. > 05) Nomen's status changes from "stopped" to "running-standby" to "running". > 06) Rubric retains all the resources. However, all the resources on Rubric > bounces/restarts when Nomen's status changes from "running-standby" to > "running". With the configuration you posted below, this should not happen. The configuration looks good for what you want. If you're sure that is what you do and get, please file a bug about that and include a hb_report. http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > Is there a way to prevent the resources in Rubric to bounce/restart when > Nomen rejoins the cluster? > > Help. > > > > On the other hand, you pointed me to the right direction regarding the MailTo > OCFAgent. > > This is how the variable looked like in .ocf-binaries when it was not working. > > rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries > : ${MAILCMD:=} > > I assigned the exact path of the mail command to the variable. Now, I get > emailed every time a failover happens. Wooot! Wooot! :) > > rubric ~]# grep MAIL /usr/lib/ocf/resource.d/heartbeat/.ocf-binaries > : ${MAILCMD:=/bin/mail} Good. I think this was on the lists earlier. Apparently a packaging issue. Regards Dominik > Thanks. > > > Below is my current cib.xml file. > > <cib admin_epoch="0" validate-with="pacemaker-1.0" crm_feature_set="3.0" > have-quorum="1" dc-uuid="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" epoch="102" > num_updates="0" cib-last-written="Wed Jan 28 08:32:39 2009"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="1.0.1-node: 6fc5ce8302abf145a02891ec41e5a492efbe8efe"/> > </cluster_property_set> > </crm_config> > <nodes> > <node id="5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" uname="nomen.esri.com" > type="normal"> > <instance_attributes id="nodes-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e"> > <nvpair id="standby-5e3e3c2d-55e7-4c51-90be-5c4a1912bf3e" > name="standby" value="off"/> > </instance_attributes> > </node> > <node id="27f54ec3-b626-4b4f-b8a6-4ed0b768513c" uname="rubric.esri.com" > type="normal"> > <instance_attributes id="nodes-27f54ec3-b626-4b4f-b8a6-4ed0b768513c"> > <nvpair id="standby-27f54ec3-b626-4b4f-b8a6-4ed0b768513c" > name="standby" value="off"/> > </instance_attributes> > </node> > </nodes> > <resources> > <group id="Directory_Server"> > <meta_attributes id="Directory_Server-meta_attributes"> > <nvpair id="Directory_Server-meta_attributes-collocated" > name="collocated" value="true"/> > <nvpair id="Directory_Server-meta_attributes-ordered" > name="ordered" value="true"/> > <nvpair id="Directory_Server-meta_attributes-migration-threshold" > name="migration-threshold" value="1"/> > <nvpair id="Directory_Server-meta_attributes-failure-timeout" > name="failure-timeout" value="10s"/> > <nvpair id="Directory_Server-meta_attributes-resource-stickiness" > name="resource-stickiness" value="10"/> > </meta_attributes> > <primitive class="ocf" id="VIP" provider="heartbeat" type="IPaddr"> > <instance_attributes id="VIP-instance_attributes"> > <nvpair id="VIP-instance_attributes-ip" name="ip" > value="10.50.26.250"/> > </instance_attributes> > <operations id="VIP-ops"> > <op id="VIP-monitor-5s" interval="5s" name="monitor" > timeout="5s"/> > </operations> > </primitive> > <primitive class="ocf" id="ECAS" provider="esri" type="ecas"> > <operations id="ECAS-ops"> > <op id="ECAS-monitor-3s" interval="3s" name="monitor" > timeout="3s"/> > </operations> > </primitive> > <primitive class="ocf" id="FDS_Admin" provider="esri" type="fdsadm"> > <operations id="FDS_Admin-ops"> > <op id="FDS_Admin-monitor-3s" interval="3s" name="monitor" > timeout="3s"/> > </operations> > </primitive> > <primitive class="ocf" id="Emergency_Contact" provider="heartbeat" > type="MailTo"> > <instance_attributes id="Emergency_Contact-instance_attributes"> > <nvpair id="Emergency_Contact-instance_attributes-email" > name="email" value="jya...@esri.com"/> > <nvpair id="Emergency_Contact-instance_attributes-subject" > name="subject" value="Failover Occured"/> > </instance_attributes> > <operations id="Emergency_Contact-ops"> > <op id="Emergency_Contact-monitor-3s" interval="3s" > name="monitor" timeout="3s"/> > </operations> > </primitive> > </group> > </resources> > <constraints/> > </configuration> > </cib> _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems