Patch to bring eDir88 RA to 0.15 version.

Major improvements to the detection of running local ndsd processes. RA
now follows same logic as official ndsmanage utility, and this removes
the risk of the RA detecting eDir processes incorrectly in failure
scenarios with multiple processes.

Squashed nasty bug where local RA could in certain circumstances connect
to remote ndsd process.

New RA has been in use for several weeks on production system with no
reported issues so far.

Lars, assuming no further changes required, I think this should be a
good version to include in the next heartbeat patch for SLE, as you
proposed a while ago. Code should be getting mature now... and has had
formal testing as well as some live exposure.

Feedback anybody?

Failing which, I'd be glad if somebody could commit for me :)

Thanks

Yan
#!/bin/bash
#
#      eDirectory Resource Agent (RA) for Heartbeat.
#      This script is only compatible with eDirectory 8.8 and later
#
# Copyright (c) 2007 Novell Inc, Yan Fitterer
#                    All Rights Reserved.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like.  Any license provided herein, whether implied or
# otherwise, applies only to this software file.  Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
#
#
# OCF parameters:
#  OCF_RESKEY_eDir_config_file      - full filename to instance configuration 
file
#  OCF_RESKEY_eDir_monitor_ldap     - Should we monitor LDAP (0/1 - 1 is true)
#  OCF_RESKEY_eDir_monitor_idm      - Should we monitor IDM (0/1 - 1 is true)
#  OCF_RESKEY_eDir_jvm_initial_heap - Value of the DHOST_INITIAL_HEAP java env 
var
#  OCF_RESKEY_eDir_jvm_max_heap     - Value of the DHOST_MAX_HEAP java env var
#  OCF_RESKEY_eDir_jvm_options      - Value of the DHOST_OPTIONS java env var
###############################################################################

#######################################################################
# Initialization:

. /usr/lib/heartbeat/ocf-shellfuncs
. /opt/novell/eDirectory/bin/ndspath 2>/dev/null >/dev/null

#######################################################################

usage() {
    ME=$(basename "$0")
    cat <<-EOFA

usage: $ME start|stop|status|monitor|validate-all

$ME manages an eDirectory instance as an HA resource.

The 'start' operation starts the instance.
The 'stop' operation stops the instance.
The 'status' operation reports if the instance is running.
The 'monitor' operation reports if the instance is running, and runs additional 
checks.
The 'validate-all' operation checks the validity of the arguments (environment 
variables).
EOFA
}

eDir_meta_data() {
cat <<-EOFB
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="eDir88" version="0.15">
<version>1.0</version>

<longdesc lang="en">
Resource script for managing an eDirectory instance. Manages a single instance
of eDirectory as an HA resource. The "multiple instances" feature or
eDirectory has been added in version 8.8. This script will not work for any
version of eDirectory prior to 8.8. This RA can be used to load multiple
eDirectory instances on the same host.

It is very strongly recommended to put eDir configuration files (as per the
eDir_config_file parameter) on local storage on each node. This is necessary for
this RA to be able to handle situations where the shared storage has become
unavailable. If the eDir configuration file is not available, this RA will fail,
and heartbeat will be unable to manage the resource. Side effects include 
STONITH actions, unmanageable resources, etc...

Setting a high action timeout value is _very_ _strongly_ recommended. eDir
with IDM can take in excess of 10 minutes to start. If heartbeat times out
before eDir has had a chance to start properly, mayhem _WILL ENSUE_.

The LDAP module seems to be one of the very last to start. So this script will
take even longer to start on installations with IDM and LDAP if the monitoring
of IDM and/or LDAP is enabled, as the start command will wait for IDM and LDAP
to be available.
</longdesc>
<shortdesc lang="en">eDirectory resource agent</shortdesc>
<parameters>
<parameter name="eDir_config_file" unique="1" required="1">
<longdesc lang="en">
Path to configuration file for eDirectory instance.
</longdesc>
<shortdesc lang="en">eDir config file</shortdesc>
<content type="string" default="/etc/opt/novell/eDirectory/conf/nds.conf" />
</parameter>
<parameter name="eDir_monitor_ldap" required="0">
<longdesc lang="en">
Should we monitor if LDAP is running for the eDirectory instance?
</longdesc>
<shortdesc lang="en">eDir monitor ldap</shortdesc>
<content type="boolean" default="0" />
</parameter>
<parameter name="eDir_monitor_idm" required="0">
<longdesc lang="en">
Should we monitor if IDM is running for the eDirectory instance?
</longdesc>
<shortdesc lang="en">eDir monitor IDM</shortdesc>
<content type="boolean" default="0" />
</parameter>
<parameter name="eDir_jvm_initial_heap" required="0">
<longdesc lang="en">
Value for the DHOST_INITIAL_HEAP java environment variable. If unset, java 
defaults will be used.
</longdesc>
<shortdesc lang="en">DHOST_INITIAL_HEAP value</shortdesc>
<content type="integer" default="" />
</parameter>
<parameter name="eDir_jvm_max_heap" required="0">
<longdesc lang="en">
Value for the DHOST_MAX_HEAP java environment variable. If unset, java defaults 
will be used.
</longdesc>
<shortdesc lang="en">DHOST_MAX_HEAP value</shortdesc>
<content type="integer" default="" />
</parameter>
<parameter name="eDir_jvm_options" required="0">
<longdesc lang="en">
Value for the DHOST_OPTIONS java environment variable. If unset, original 
values will be used.
</longdesc>
<shortdesc lang="en">DHOST_OPTIONS value</shortdesc>
<content type="string" default="" />
</parameter>
</parameters>

<actions>
<action name="start" timeout="600" />
<action name="stop" timeout="600" />
<action name="monitor" timeout="60" interval="30" />
<action name="meta-data" timeout="5" />
<action name="validate-all" timeout="5" />
</actions>
</resource-agent>
EOFB
return $OCF_SUCCESS
}

#
# eDir_start: Start eDirectory instance
#

eDir_start() {
    if eDir_status ; then
        ocf_log info "eDirectory is already running ($NDSCONF)."
        return $OCF_SUCCESS
    fi

    # Start eDirectory instance
    if [ -n "$OCF_RESKEY_eDir_jvm_initial_heap" ]; then
        DHOST_JVM_INITIAL_HEAP=$OCF_RESKEY_eDir_jvm_initial_heap
        export DHOST_JVM_INITIAL_HEAP
    fi
    if [ -n "$OCF_RESKEY_eDir_jvm_max_heap" ]; then
        DHOST_JVM_MAX_HEAP=$OCF_RESKEY_eDir_jvm_max_heap
        export DHOST_JVM_MAX_HEAP
    fi
    if [ -n "$OCF_RESKEY_eDir_jvm_options" ]; then
        DHOST_JVM_OPTIONS=$OCF_RESKEY_eDir_jvm_options
        export DHOST_JVM_OPTIONS
    fi

    $NDSMANAGE start --config-file "$NDSCONF" > /dev/null 2>&1
    if [ $? -eq 0 ]; then
        ocf_log info "eDir start command sent for $NDSCONF."
    else
        echo "ERROR: Can't start eDirectory for $NDSCONF."
        return $OCF_ERR_GENERIC
    fi

    CNT=0
    while ! eDir_monitor ; do
        # Apparently, LDAP will only start after all other services
        # Startup time can be in excess of 10 minutes.
        # Leave a very long heartbeat timeout on the start action
        # We're relying on heartbeat to bail us out...
        let CNT=$CNT+1
        ocf_log info "eDirectory start waiting for ${CNT}th retry for $NDSCONF."
        sleep 10
    done
    
    ocf_log info "eDirectory start verified for $NDSCONF."
    
    return $OCF_SUCCESS
}

#
# eDir_stop: Stop eDirectory instance
#            This action is written in such a way that even when run
#            on a node were things are broken (no binaries, no config
#            etc...) it will try to stop any running ndsd processes
#            and report success if none are running.
#

eDir_stop() {
    if ! eDir_status ; then
        return $OCF_SUCCESS
    fi

    if [ -r "$NDSCONF" -a -x $NDSMANAGE ] ; then
        $NDSMANAGE stop --config-file "$NDSCONF" >/dev/null 2>&1
        if eDir_status ; then
                # eDir failed to stop.
            ocf_log err "eDirectory instance failed to stop for $NDSCONF"
        else
            ocf_log info "eDirectory stop verified for $NDSCONF."
            return $OCF_SUCCESS
        fi
    else
        ocf_log err "ndsmanage binary or config file missing ($NDSCONF). STOP 
action failed."
        return $OCF_ERR_GENERIC
    fi
}

#
# eDir_status: is eDirectory instance up ?
#

eDir_status() {
    if [ ! -r "$NDSCONF" ] ; then
        ocf_log err "Config file missing ($NDSCONF)."
        exit $OCF_ERR_GENERIC
    fi 

    # Find how many ndsd processes have open listening sockets
    # with the IP of this eDir instance
    IFACE=$(grep -i "n4u.server.interfaces" $NDSCONF | cut -f2 -d= | tr '@' ':')
    if [ -z "$IFACE" ] ; then
        ocf_log err "Cannot retrieve interfaces from $NDSCONF. eDirectory may 
not be correctly configured."
        exit $OCF_ERR_GENERIC
    fi
    NDSD_SOCKS=$(netstat -ntlp | grep -ce "$IFACE.*ndsd")

    if [ "$NDSD_SOCKS" -eq 1 ] ; then
        # Correct ndsd instance is definitely running
        # Further checks are superfluous (I think...)
        return 0
    elif [ "$NDSD_SOCKS" -gt 1 ] ; then
        ocf_log err "More than 1 ndsd listening socket matched. Likely 
misconfiguration of eDirectory."
        exit $OCF_ERR_GENERIC
    fi

    # No listening socket. Make sure we don't have the process running...
    PIDDIR=$(grep -i "n4u.server.vardir" "$NDSCONF" | cut -f2 -d=)
    if [ -z "$PIDDIR" ] ; then
        ocf_log err "Cannot get vardir from nds config ($NDSCONF). Probable 
eDir configuration error."
        exit $OCF_ERR_GENERIC
    fi 
    NDSD_PID=$(cat $PIDDIR/ndsd.pid 2>/dev/null)
    if [ -z "$NDSD_PID" ] ; then
        # PID file unavailable or empty.
        # This will happen if the PIDDIR is not available
        # on this node at this time.
        return 1
    fi

    RC=$(ps -p "$NDSD_PID" | grep -c ndsd)
    if [ "$RC" -gt 0 ] ; then
        # process found but no listening socket. ndsd likely not operational
        ocf_log err "ndsd process found, but no listening socket. Something's 
gone wrong ($NDSCONF)"
        exit $OCF_ERR_GENERIC
    fi

    # Instance is not running, but no other error detected.
    return 1
}


#
# eDir_monitor: Do more in-depth checks to ensure that eDirectory is fully 
functional
#               LDAP and IDM checks are only done if reqested.
#               
#

eDir_monitor() {
    if ! eDir_status ; then
        ocf_log info "eDirectory instance is down ($NDSCONF)"
        return $OCF_NOT_RUNNING
    fi

    # We know the right ndsd is running locally, check health
    $NDSSTAT --config-file "$NDSCONF" >/dev/null 2>&1
    if [ $? -ne 0 ] ; then
        return 1
    fi

    # Monitor IDM first, as it will start before LDAP
    if [ $MONITOR_IDM -eq 1 ]; then
        RET=$($NDSTRACE --config-file "$NDSCONF" -c modules | egrep -i 
'^vrdim.*Running' | awk '{print $1}')
        if [ "$RET" !=  "vrdim" ]; then
            ocf_log err "eDirectory IDM engine isn't running ($NDSCONF)."
            return $OCF_ERR_GENERIC
        fi
    fi
    if [ $MONITOR_LDAP -eq 1 ] ; then
        $NDSNLDAP -c --config-file "$NDSCONF" >/dev/null 2>&1
        if [ $? -ne 0 ]; then
            ocf_log err "eDirectory LDAP server isn't running ($NDSCONF)."
            return $OCF_ERR_GENERIC
        fi
    fi

    ocf_log debug "eDirectory monitor success ($NDSCONF)"
    return $OCF_SUCCESS
}

#
# eDir_validate: Validate environment
#

eDir_validate() {

    declare rc=$OCF_SUCCESS

    # Script must be run as root
    if ! ocf_is_root ; then
        ocf_log err "$0 must be run as root"
        rc=$OCF_ERR_GENERIC
    fi

    # ndsmanage must be available and runnable
    if [ ! -x $NDSMANAGE ] ; then
        ocf_log err "Cannot run $NDSMANAGE"
        rc=$OCF_ERR_INSTALLED
    fi

    # ndsstat must be available and runnable
    if [ ! -x $NDSSTAT ]; then
        ocf_log err "Cannot run $NDSSTAT"
        rc=$OCF_ERR_INSTALLED
    fi

    # Config file must be readable
    if [ ! -r "$NDSCONF" ] ; then
        ocf_log err "eDirectory configuration file [$NDSCONF] is not readable"
        rc=$OCF_ERR_ARGS
    fi

    # monitor_ldap must be unambiguously resolvable to a truth value
    MONITOR_LDAP=$(echo "$MONITOR_LDAP" | tr [A-Z] [a-z])
    case "$MONITOR_LDAP" in
        yes|true|1)
            MONITOR_LDAP=1;;
        no|false|0)
            MONITOR_LDAP=0;;
        *)
            ocf_log err "Configuration parameter eDir_monitor_ldap has invalid 
value [$MONITOR_LDAP]"
            rc=$OCF_ERR_ARGS;;
    esac

    # monitor_idm must be unambiguously resolvable to a truth value
    MONITOR_IDM=$(echo "$MONITOR_IDM" | tr [A-Z] [a-z])
    case "$MONITOR_IDM" in
        yes|true|1)
            MONITOR_IDM=1;;
        no|false|0)
            MONITOR_IDM=0;;
        *)
            ocf_log err "Configuration parameter eDir_monitor_idm has invalid 
value [$MONITOR_IDM]"
            rc=$OCF_ERR_ARGS;;
    esac

    # eDir_jvm_initial_heap must be blank or numeric
    if [ -n "$OCF_RESKEY_eDir_jvm_initial_heap" ] ; then
        if ! ocf_is_decimal "$OCF_RESKEY_eDir_jvm_initial_heap" ; then
            ocf_log err "Configuration parameter eDir_jvm_initial_heap has 
invalid" \
                        "value [$OCF_RESKEY_eDir_jvm_initial_heap]"
            rc=$OCF_ERR_ARGS
        fi
    fi

    # eDir_jvm_max_heap must be blank or numeric
    if [ -n "$OCF_RESKEY_eDir_jvm_max_heap" ] ; then
        if ! ocf_is_decimal "$OCF_RESKEY_eDir_jvm_max_heap" ; then
            ocf_log err "Configuration parameter eDir_jvm_max_heap has invalid" 
\
                         "value [$OCF_RESKEY_eDir_jvm_max_heap]"
            rc=$OCF_ERR_ARGS
        fi
    fi
    if [ $rc -ne $OCF_SUCCESS ] ; then
        ocf_log err "Invalid environment"
    fi
    return $rc
}

#
# Start of main logic
#

ocf_log debug "$0 started with arguments \"[EMAIL PROTECTED]""

NDSBASE=/opt/novell/eDirectory
NDSNLDAP=$NDSBASE/sbin/nldap
NDSMANAGE=$NDSBASE/bin/ndsmanage
NDSSTAT=$NDSBASE/bin/ndsstat
NDSTRACE=$NDSBASE/bin/ndstrace
NDSCONF=${OCF_RESKEY_eDir_config_file:-/etc/opt/novell/eDirectory/conf/nds.conf}
MONITOR_LDAP=${OCF_RESKEY_eDir_monitor_ldap:-0}
MONITOR_IDM=${OCF_RESKEY_eDir_monitor_idm:-0}


# What kind of method was invoked?
case "$1" in
    validate-all) eDir_validate;   exit $?;;
    meta-data)    eDir_meta_data;  exit $OCF_SUCCESS;;
    stop)         eDir_stop;       exit $?;;
    status)       if eDir_status ; then
                      ocf_log info "eDirectory instance is up ($NDSCONF)"
                      exit $OCF_SUCCESS
                  else
                      ocf_log info "eDirectory instance is down ($NDSCONF)"
                      exit $OCF_NOT_RUNNING
                  fi;;
    start)        : skip;;
    monitor)      : skip;;
    usage)        usage; exit $OCF_SUCCESS;;
    *)            ocf_log err "Invalid argument [$1]"
                  usage; exit $OCF_ERR_ARGS;;
esac

# From now on we must have a valid environment to continue.
# stop goes in the list above as it should ideally be able to
# clean up after a start that failed due to bad args

eDir_validate
RC=$?
if [ $RC -ne $OCF_SUCCESS ]; then
    exit $RC
fi

case "$1" in
    start)      eDir_start;;
    monitor)    eDir_monitor;;
esac

exit $?
--- eDir88-0.14 2007-04-27 20:48:09.828125000 +0100
+++ eDir88-0.15 2007-06-02 00:48:30.375000000 +0100
@@ -39,6 +39,7 @@
 # Initialization:
 
 . /usr/lib/heartbeat/ocf-shellfuncs
+. /opt/novell/eDirectory/bin/ndspath 2>/dev/null >/dev/null
 
 #######################################################################
 
@@ -62,7 +63,7 @@
 cat <<-EOFB
 <?xml version="1.0"?>
 <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
-<resource-agent name="eDir88" version="0.14">
+<resource-agent name="eDir88" version="0.15">
 <version>1.0</version>
 
 <longdesc lang="en">
@@ -79,7 +80,7 @@
 and heartbeat will be unable to manage the resource. Side effects include 
 STONITH actions, unmanageable resources, etc...
 
-Setting a high action timeout value is also very strongly recommended. eDir
+Setting a high action timeout value is _very_ _strongly_ recommended. eDir
 with IDM can take in excess of 10 minutes to start. If heartbeat times out
 before eDir has had a chance to start properly, mayhem _WILL ENSUE_.
 
@@ -139,7 +140,7 @@
 <action name="stop" timeout="600" />
 <action name="monitor" timeout="60" interval="30" />
 <action name="meta-data" timeout="5" />
-<action name="validate-all" timeout="5">
+<action name="validate-all" timeout="5" />
 </actions>
 </resource-agent>
 EOFB
@@ -223,27 +224,63 @@
 }
 
 #
-# eDir_status: is eDirectory instance up (do simple ndsstat check)?
+# eDir_status: is eDirectory instance up ?
 #
 
 eDir_status() {
-    if [ -r "$NDSCONF" -a -x $NDSSTAT ] ; then
-        $NDSSTAT --config-file "$NDSCONF" >/dev/null 2>&1
-        if [ $? -eq 0 ] ; then
-            return 0
-        else
-            return 1
-        fi
-    else
-       ocf_log err "ndsstat binary or config file missing ($NDSCONF)."
-       return 1
+    if [ ! -r "$NDSCONF" ] ; then
+        ocf_log err "Config file missing ($NDSCONF)."
+        exit $OCF_ERR_GENERIC
+    fi 
+
+    # Find how many ndsd processes have open listening sockets
+    # with the IP of this eDir instance
+    IFACE=$(grep -i "n4u.server.interfaces" $NDSCONF | cut -f2 -d= | tr '@' 
':')
+    if [ -z "$IFACE" ] ; then
+        ocf_log err "Cannot retrieve interfaces from $NDSCONF. eDirectory may 
not be correctly configured."
+        exit $OCF_ERR_GENERIC
+    fi
+    NDSD_SOCKS=$(netstat -ntlp | grep -ce "$IFACE.*ndsd")
+
+    if [ "$NDSD_SOCKS" -eq 1 ] ; then
+        # Correct ndsd instance is definitely running
+        # Further checks are superfluous (I think...)
+        return 0
+    elif [ "$NDSD_SOCKS" -gt 1 ] ; then
+        ocf_log err "More than 1 ndsd listening socket matched. Likely 
misconfiguration of eDirectory."
+        exit $OCF_ERR_GENERIC
+    fi
+
+    # No listening socket. Make sure we don't have the process running...
+    PIDDIR=$(grep -i "n4u.server.vardir" "$NDSCONF" | cut -f2 -d=)
+    if [ -z "$PIDDIR" ] ; then
+        ocf_log err "Cannot get vardir from nds config ($NDSCONF). Probable 
eDir configuration error."
+        exit $OCF_ERR_GENERIC
+    fi 
+    NDSD_PID=$(cat $PIDDIR/ndsd.pid 2>/dev/null)
+    if [ -z "$NDSD_PID" ] ; then
+        # PID file unavailable or empty.
+        # This will happen if the PIDDIR is not available
+        # on this node at this time.
+        return 1
+    fi
+
+    RC=$(ps -p "$NDSD_PID" | grep -c ndsd)
+    if [ "$RC" -gt 0 ] ; then
+        # process found but no listening socket. ndsd likely not operational
+        ocf_log err "ndsd process found, but no listening socket. Something's 
gone wrong ($NDSCONF)"
+        exit $OCF_ERR_GENERIC
     fi
+
+    # Instance is not running, but no other error detected.
+    return 1
 }
 
 
 #
-# eDir_monitor: Do several checks to ensure that eDirectory is fully functional
+# eDir_monitor: Do more in-depth checks to ensure that eDirectory is fully 
functional
 #               LDAP and IDM checks are only done if reqested.
+#               
 #
 
 eDir_monitor() {
@@ -252,19 +289,25 @@
         return $OCF_NOT_RUNNING
     fi
 
+    # We know the right ndsd is running locally, check health
+    $NDSSTAT --config-file "$NDSCONF" >/dev/null 2>&1
+    if [ $? -ne 0 ] ; then
+        return 1
+    fi
+
     # Monitor IDM first, as it will start before LDAP
     if [ $MONITOR_IDM -eq 1 ]; then
         RET=$($NDSTRACE --config-file "$NDSCONF" -c modules | egrep -i 
'^vrdim.*Running' | awk '{print $1}')
         if [ "$RET" !=  "vrdim" ]; then
             ocf_log err "eDirectory IDM engine isn't running ($NDSCONF)."
-            return $OCF_NOT_RUNNING
+            return $OCF_ERR_GENERIC
         fi
     fi
     if [ $MONITOR_LDAP -eq 1 ] ; then
         $NDSNLDAP -c --config-file "$NDSCONF" >/dev/null 2>&1
         if [ $? -ne 0 ]; then
             ocf_log err "eDirectory LDAP server isn't running ($NDSCONF)."
-            return $OCF_NOT_RUNNING
+            return $OCF_ERR_GENERIC
         fi
     fi
 

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to