Hi Everyone, 

I'm in the process of configuring a 2 node + DRBD enabled DHCP cluster
using the following packages:

SLES 11 SP1, with Pacemaker 1.1.6, corosync 1.4.2, and drbd 8.3.12.

I know about DHCP's internal fail-over abilities, but after testing, it
simply failed to remain viable as a more robust HA type cluster. As such
I began working on this solution. For reference my current configuration
looks like this:

node dhcp-vm01 \
        attributes standby="off"
node dhcp-vm02 \
        attributes standby="on"
primitive DHCPFS ocf:heartbeat:Filesystem \
        params device="/dev/drbd1" directory="/var/lib/dhcp"
fstype="ext4" \
        meta target-role="Started"
primitive dhcp-cluster ocf:heartbeat:IPaddr2 \
        params ip="xxx.xxx.xxx.xxx" cidr_netmask="32" \
        op monitor interval="10s"
primitive dhcpd_service ocf:heartbeat:dhcpd \
        params dhcpd_config="/etc/dhcpd.conf" \
        dhcpd_interface="eth0" \
        op monitor interval="1min" \
        meta target-role="Started"
primitive dhcpdrbd ocf:linbit:drbd \
        params drbd_resource="dhcpdata" \
        op monitor interval="60s"
ms DHCPData dhcpdrbd \
        meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
colocation dhcpd_service-with_cluster_ip inf: dhcpd_service dhcp-cluster
colocation fs_on_drbd inf: DHCPFS DHCPData:Master
order DHCP-after-dhcpfs inf: DHCPFS:promote dhcpd_service:start
order dhcpfs_after_dhcpdata inf: DHCPData:promote DHCPFS:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"

The floating IP works without issue, as does the DRBD integration such
that if I put a node into standby, the IP, DRBD master/slave and FS
mounts all transfer correctly. Only the DHCP component itself is
failing, in that it wont start properly from within pacemaker. 

I suspect it is due to having to write a new script as I could not find
an existing DHCPD RA agent anywhere. I built my own based off the
development guide for resource agents on the wiki. I've managed to get
it to complete all the tests I need it to pass in the ocf-tester script:

ocf-tester -n dhcpd -o
monitor_client_interface=eth0 /usr/lib/ocf/resource.d/heartbeat/dhcpd
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/dhcpd...
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
/usr/lib/ocf/resource.d/heartbeat/dhcpd passed all tests

Additionally if I run each of the various options
(start/stop/monitor/validate-all/status/meta-data) at the command line,
they all work with out issue, and stop/start the DHCPD process as
expected.

dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp
root     12516  0.0  0.1   4344   756 pts/3    S+   17:16   0:00 grep
dhcp
dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat
# /usr/lib/ocf/resource.d/heartbeat/dhcpd start
DEBUG: Validating the dhcpd binary exists.
DEBUG: Validating that we are running in chrooted mode
DEBUG: Chrooted mode is active, testing the chrooted path exists
DEBUG: Checking to see if the /var/lib/dhcp//etc/dhcpd.conf exists and
is readable
DEBUG: Validating the dhcpd user exists
DEBUG: Validation complete, everything looks good.
DEBUG: Testing the state of the daemon itself
DEBUG: OCF_NOT_RUNNING: 7
INFO: The dhcpd process is not running
Internet Systems Consortium DHCP Server V3.1-ESV
Copyright 2004-2010 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
WARNING: Host declarations are global.  They are not limited to the
scope you declared them in.
Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were
not specified in the config file
Wrote 0 deleted host decls to leases file.
Wrote 0 new dynamic host decls to leases file.
Wrote 0 leases to leases file.
Listening on LPF/eth0/00:0c:29:d7:64:99/SERVERS
Sending on   LPF/eth0/00:0c:29:d7:64:99/SERVERS
Sending on   Socket/fallback/fallback-net
0
INFO: dhcpd [chrooted] has started.
DEBUG: Resource Agent Exit Status 0
DEBUG: default start returned 0
dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp
dhcpd    12653  0.0  0.2  26636  1164 ?        Ss   17:16   0:00 dhcpd
-cf /etc/dhcpd.conf -chroot /var/lib/dhcp -lf /db/dhcpd.leases -user
dhcpd -group nogroup -pf /var/run/dhcpd.pid
root     12658  0.0  0.1   4344   752 pts/3    S+   17:16   0:00 grep
dhcp

However, when I try to do the same from within pacemaker it fails to
properly start up and I get the following error (crm_mon):

Failed actions:
    dhcpd_service_monitor_0 (node=dhcp-vm01, call=3, rc=5,
status=complete): not installed
    dhcpd_service_monitor_0 (node=dhcp-vm02, call=3, rc=5,
status=complete): not installed

After a bit of digging through the syslog log entries, I've tracked down
the following lines:

Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op:
dhcpd_service_monitor_0 on dhcp-vm01 returned 5 (not installed) instead
of the expected value: 7 (not running)
Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard
error - dhcpd_service_monitor_0 failed with rc=5: Preventing
dhcpd_service from re-starting on dhcp-vm01
Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op:
dhcpd_service_monitor_0 on dhcp-vm02 returned 5 (not installed) instead
of the expected value: 7 (not running)
Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard
error - dhcpd_service_monitor_0 failed with rc=5: Preventing
dhcpd_service from re-starting on dhcp-vm02

Of which I then took a closer look at the monitor/status and
validate-all functions in my script:

# Validate most critical parameters
dhcpd_validate_all() {
    ocf_log debug "Validating the ${OCF_RESKEY_dhcpd} binary exists."
    check_binary ${OCF_RESKEY_dhcpd}

    if [ ocf_is_probe ] ; then
        ocf_log debug "Validating that we are running in chrooted mode"
        if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then
            ocf_log debug "Chrooted mode is active, testing the chrooted
path exists"
            if ! test -e "${OCF_RESKEY_dhcpd_chrooted_path}"; then
                ocf_log err "Path ${OCF_RESKEY_dhcpd_chrooted_path} does
not exist."
                return $OCF_ERR_INSTALLED
            fi

            ocf_log debug "Checking to see if the
${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} exists and
is readable"
            if test -n
"${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}" -a ! -r
"${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}"; then
                ocf_log err "Configuration file
${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} doesn't
exist"
                return $OCF_ERR_INSTALLED
            fi
        fi
    else
        ocf_log info "${OCF_RESKEY_dhcpd_chrooted_path} not readable
during probe."
        return $OCF_ERR_INSTALLED
    fi

    ocf_log debug "Validating the ${OCF_RESKEY_dhcpd_user} user exists"
    getent passwd ${OCF_RESKEY_dhcpd_user} >/dev/null 2>&1
    if ! test $? -eq 0; then
        ocf_log err "User ${OCF_RESKEY_dhcpd_user} doesn't exist";
        return $OCF_ERR_INSTALLED
    fi

    ocf_log debug "Validation complete, everything looks good."

    return $OCF_SUCCESS
}

# dhcpd_status. Simple check of the status of dhcpd process by pidfile.
dhcpd_status () {
    if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then
        ocf_pidfile_status
${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_pidfile} >/dev/null
2>&1
    else
        ocf_pidfile_status ${OCF_RESKEY_dhcpd_pidfile} >/dev/null 2>&1
    fi
}

# dhcpd_monitor. Send a request to dhcpd and check response.
dhcpd_monitor() {
    local output

    ocf_log debug "Testing the state of the daemon itself"
    ocf_log debug "OCF_NOT_RUNNING: $OCF_NOT_RUNNING"
    if ! dhcpd_status
    then
        ocf_log info "The dhcpd process is not running"
        return $OCF_NOT_RUNNING
    fi

    return $OCF_SUCCESS
}

I see nothing wrong that would tell me it is returning a "not installed"
state during the validate or the monitoring phases.

This script is a bit large, and I am attaching it for reference to see
if anyone can take a peak and point out anything I am overlooking. The
script itself is using the same "concepts" that were defined in the
named RA script, and blended with the official RA developers guide. It
also borrows some code from the main DHCPD init script that ships with
SLES 11. 

The script is not yet finalized in that some extra monitoring elements
are "partially" there, but not yet fully worked, and chrooted mode is
currently the only mode supported (why would you run a non-chrooted DHCP
server?!!?). In addition acknowledgment of original authors is not yet
in there, and will be added once I get closer to a more complete script.

Any help would be appreciated, and if additional details are needed, let
me know and I will fill in any holes I can.
Thanks
Chris

Attachment: dhcpd
Description: application/shellscript

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to