On Thu, Dec 30, 2010 at 10:24:12AM +0100, alexander.kra...@basf.com wrote: > > On Wed, Dec 29, 2010 at 03:04:18PM +0100, Alexander Krauth wrote: > > > # HG changeset patch > > > # User Alexander Krauth <li...@sap.com> > > > # Date 1293631454 -3600 > > > # Node ID a1f4bf0db5ff8c7c2ebd02e413df5e15201d4a7c > > > # Parent 69cd9345a879e7764b4457834ded0093274d0322 > > > High: SAPInstance: Fixed monitor_clone function to ensure enqueue > failover, in case of process (not host) failure > > > > > > RAs in versions <= 2.01 used a Heartbeat 2.0 specific feature to > distinquish, if running in master or slave mode. > > > This is not working with Pacemaker anymore. > > > > > > Since RA version 2.02 (not in official release) the monitor_clone > function is damaged for the case of a local failure of the Standalone > Enqueue process. > > > > > > This patch follows the requirement, that the RA must know be itself, > if it is running in master or slave mode. > > > Also it ensures, that always the salve (Enqueue Replication Server) > gets promoted, if the master (Standalone Enqueue Server) fails. > > > > > > diff -r 69cd9345a879 -r a1f4bf0db5ff heartbeat/SAPInstance > > > --- a/heartbeat/SAPInstance Wed Dec 29 14:40:41 2010 +0100 > > > +++ b/heartbeat/SAPInstance Wed Dec 29 15:04:14 2010 +0100 > > > @@ -32,6 +32,10 @@ > > > # OCF_RESKEY_PRE_STOP_USEREXIT (optional, lists a script which > can be executed before the resource is stopped) > > > # OCF_RESKEY_POST_STOP_USEREXIT (optional, lists a script which > can be executed after the resource is stopped) > > > # > > > +# TODO: - Option to shutdown sapstartsrv for non-active instances -> > that means: do probes only with OS tools (sapinstance_status) > > > +# - Option for better standalone enqueue server monitoring, > using ensmon (test enque-deque) > > > +# - Option for cleanup abandoned enqueue replication tables > > > +# > > > > ####################################################################### > > > # Initialization: > > > > > > @@ -68,7 +72,7 @@ > > > <?xml version="1.0"?> > > > <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> > > > <resource-agent name="SAPInstance"> > > > -<version>2.11</version> > > > +<version>2.12</version> > > > > > > <shortdesc lang="en">Manages a SAP instance as an HA > resource.</shortdesc> > > > <longdesc lang="en"> > > > @@ -708,7 +712,7 @@ > > > # > > > sapinstance_start_clone() { > > > sapinstance_init $OCF_RESKEY_ERS_InstanceName > > > - ${HA_SBIN_DIR}/crm_master -v 100 -l reboot > > > + ${HA_SBIN_DIR}/crm_master -v 50 -l reboot > > > sapinstance_start > > > return $? > > > } > > > @@ -729,17 +733,38 @@ > > > # sapinstance_monitor_clone > > > # > > > sapinstance_monitor_clone() { > > > - # Check status of potential master first > > > + # first check with the status function (OS tools) if there could be > something like a SAP instance running > > > + # as we do not know here, if we are in master or slave state we do > not want to start our monitoring > > > + # agents (sapstartsrv) on the wrong host > > > + > > > sapinstance_init $OCF_RESKEY_InstanceName > > > - sapinstance_monitor > > > + sapinstance_status > > > rc=$? > > > - [ $rc -eq $OCF_SUCCESS ] && return $OCF_RUNNING_MASTER > > > - [ $rc -ne $OCF_NOT_RUNNING ] && return $OCF_FAILED_MASTER > > > - > > > - # The master isn't running, and there were no errors, try ERS > > > - sapinstance_init $OCF_RESKEY_ERS_InstanceName > > > - sapinstance_monitor > > > - rc=$? > > > + if [ $rc -eq $OCF_SUCCESS ]; then > > > + sapinstance_monitor > > > + rc=$? > > > + if [ $rc -eq $OCF_SUCCESS ]; then > > > + ${HA_SBIN_DIR}/crm_master -Q -v 100 -l reboot > > > + return $OCF_RUNNING_MASTER > > > + else > > > + ${HA_SBIN_DIR}/crm_master -v 10 -l reboot # by nature of > the SAP enqueue server we have to make sure > > > > Shouldn't this be something like '-v -10'? I'm really not > > sure, but if the master failed then this node may not be > > capable of running the master. > > No, it should stay positive. This is for the rare case, that we do not > have a slave running (slave failed, only one node active, ...). > In that situation we want at least to do/try a local restart of the > master. > If there is a slave somewhere, it will have a higher value anyway. > > > > + # that we do a > failover to the slave (enqueue replication server) > > > + # in case the > enqueue process has failed. We signal this to the > > > + # cluster by > setting our master preference to a lower value than the slave. > > > + return $OCF_FAILED_MASTER > > > + fi > > > + else > > > + sapinstance_init $OCF_RESKEY_ERS_InstanceName > > > + sapinstance_status > > > + rc=$? > > > + if [ $rc -eq $OCF_SUCCESS ]; then > > > + sapinstance_monitor > > > + rc=$? > > > + if [ $rc -eq $OCF_SUCCESS ]; then > > > + ${HA_SBIN_DIR}/crm_master -Q -v 100 -l reboot > > > + fi > > > + fi > > > + fi > > > > I got lost in this monitor function. A bit (hopefully) cleaner > > version attached. Can you please review. > > Yes, I reviewed it. Looks very fine for me. Please apply your patch > instead of mine.
Applied. > Properly I can than do some testing next week with the final version. OK. Thanks, Dejan > > Thanks, > > Dejan > > Regards, > Alex > > > > return $rc > > > } > > > @@ -785,16 +810,25 @@ > > > > > > > > > # > > > -# sapinstance_notify: After promotion of one master in the cluster, > we make sure that all clones reset thier master > > > -# value back to 100. This is because a failed > monitor on a master might have degree one clone > > > -# instance to score 10. > > > +# sapinstance_notify: Handle master scoring - to make sure a slave > gets the next master > > > # > > > sapinstance_notify() { > > > local n_type="$OCF_RESKEY_CRM_meta_notify_type" > > > local n_op="$OCF_RESKEY_CRM_meta_notify_operation" > > > > > > if [ "${n_type}_${n_op}" = "post_promote" ]; then > > > + # After promotion of one master in the cluster, we make sure that > all clones reset their master > > > + # value back to 100. This is because a failed monitor on a master > might have degree one clone > > > + # instance to score 10. > > > ${HA_SBIN_DIR}/crm_master -v 100 -l reboot > > > + elif [ "${n_type}_${n_op}" = "pre_demote" ]; then > > > + # if we are a slave and a demote event is anounced, make sure we > have the highes wish to became master > > > + # that is, when a slave resource was startet after the promote > event of a already running master (e.g. node of slave was down) > > > + # We also have to make sure to overrule the globaly set > resource_stickiness or any fail-count factors => INFINITY > > > + local n_uname="$OCF_RESKEY_CRM_meta_notify_demote_uname" > > > + if [ ${n_uname} != ${HOSTNAME} ]; then > > > + ${HA_SBIN_DIR}/crm_master -v INFINITY -l reboot > > > + fi > > > fi > > > } > > _______________________________________________________ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/