Re: [Linux-HA] Custom resource agent script assistance

Andreas Kurz Thu, 01 Dec 2011 16:02:05 -0800

Hello Chris,

On 12/01/2011 06:25 PM, Chris Bowlby wrote:
> Hi Everyone, 
> 
> I'm in the process of configuring a 2 node + DRBD enabled DHCP cluster
> using the following packages:
> 
> SLES 11 SP1, with Pacemaker 1.1.6, corosync 1.4.2, and drbd 8.3.12.
> 
> I know about DHCP's internal fail-over abilities, but after testing, it
> simply failed to remain viable as a more robust HA type cluster. As such
> I began working on this solution. For reference my current configuration
> looks like this:
> 
> node dhcp-vm01 \
>         attributes standby="off"
> node dhcp-vm02 \
>         attributes standby="on"
> primitive DHCPFS ocf:heartbeat:Filesystem \
>         params device="/dev/drbd1" directory="/var/lib/dhcp"
> fstype="ext4" \
>         meta target-role="Started"
> primitive dhcp-cluster ocf:heartbeat:IPaddr2 \
>         params ip="xxx.xxx.xxx.xxx" cidr_netmask="32" \
>         op monitor interval="10s"
> primitive dhcpd_service ocf:heartbeat:dhcpd \
>         params dhcpd_config="/etc/dhcpd.conf" \
>       dhcpd_interface="eth0" \
>         op monitor interval="1min" \
>         meta target-role="Started"
> primitive dhcpdrbd ocf:linbit:drbd \
>         params drbd_resource="dhcpdata" \
>         op monitor interval="60s"
> ms DHCPData dhcpdrbd \
>         meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> colocation dhcpd_service-with_cluster_ip inf: dhcpd_service dhcp-cluster
> colocation fs_on_drbd inf: DHCPFS DHCPData:Master
> order DHCP-after-dhcpfs inf: DHCPFS:promote dhcpd_service:start
> order dhcpfs_after_dhcpdata inf: DHCPData:promote DHCPFS:start


DHCPFS:promote ?? .. that action will never occour, so dhcpd_service
will start whenever it likes ... typically not when it should ;-)

... remove that :promote ... and you miss a colocation between
dhcpd_service and it's file system.

I'd suggest using a group and colocate/order that with DRBD:

group g_dhcp DHCPFS dhcpd_service dhcp-cluster

.. or IP before dhcp if it needs to bind to it

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> property $id="cib-bootstrap-options" \
>         dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \
>         cluster-infrastructure="openais" \
>         expected-quorum-votes="2" \
>         stonith-enabled="false" \
>         no-quorum-policy="ignore"
> rsc_defaults $id="rsc-options" \
>         resource-stickiness="100"
> 
> The floating IP works without issue, as does the DRBD integration such
> that if I put a node into standby, the IP, DRBD master/slave and FS
> mounts all transfer correctly. Only the DHCP component itself is
> failing, in that it wont start properly from within pacemaker. 
> 
> I suspect it is due to having to write a new script as I could not find
> an existing DHCPD RA agent anywhere. I built my own based off the
> development guide for resource agents on the wiki. I've managed to get
> it to complete all the tests I need it to pass in the ocf-tester script:
> 
> ocf-tester -n dhcpd -o
> monitor_client_interface=eth0 /usr/lib/ocf/resource.d/heartbeat/dhcpd
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/dhcpd...
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> /usr/lib/ocf/resource.d/heartbeat/dhcpd passed all tests
> 
> Additionally if I run each of the various options
> (start/stop/monitor/validate-all/status/meta-data) at the command line,
> they all work with out issue, and stop/start the DHCPD process as
> expected.
> 
> dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp
> root     12516  0.0  0.1   4344   756 pts/3    S+   17:16   0:00 grep
> dhcp
> dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat
> # /usr/lib/ocf/resource.d/heartbeat/dhcpd start
> DEBUG: Validating the dhcpd binary exists.
> DEBUG: Validating that we are running in chrooted mode
> DEBUG: Chrooted mode is active, testing the chrooted path exists
> DEBUG: Checking to see if the /var/lib/dhcp//etc/dhcpd.conf exists and
> is readable
> DEBUG: Validating the dhcpd user exists
> DEBUG: Validation complete, everything looks good.
> DEBUG: Testing the state of the daemon itself
> DEBUG: OCF_NOT_RUNNING: 7
> INFO: The dhcpd process is not running
> Internet Systems Consortium DHCP Server V3.1-ESV
> Copyright 2004-2010 Internet Systems Consortium.
> All rights reserved.
> For info, please visit https://www.isc.org/software/dhcp/
> WARNING: Host declarations are global.  They are not limited to the
> scope you declared them in.
> Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were
> not specified in the config file
> Wrote 0 deleted host decls to leases file.
> Wrote 0 new dynamic host decls to leases file.
> Wrote 0 leases to leases file.
> Listening on LPF/eth0/00:0c:29:d7:64:99/SERVERS
> Sending on   LPF/eth0/00:0c:29:d7:64:99/SERVERS
> Sending on   Socket/fallback/fallback-net
> 0
> INFO: dhcpd [chrooted] has started.
> DEBUG: Resource Agent Exit Status 0
> DEBUG: default start returned 0
> dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp
> dhcpd    12653  0.0  0.2  26636  1164 ?        Ss   17:16   0:00 dhcpd
> -cf /etc/dhcpd.conf -chroot /var/lib/dhcp -lf /db/dhcpd.leases -user
> dhcpd -group nogroup -pf /var/run/dhcpd.pid
> root     12658  0.0  0.1   4344   752 pts/3    S+   17:16   0:00 grep
> dhcp
> 
> However, when I try to do the same from within pacemaker it fails to
> properly start up and I get the following error (crm_mon):
> 
> Failed actions:
>     dhcpd_service_monitor_0 (node=dhcp-vm01, call=3, rc=5,
> status=complete): not installed
>     dhcpd_service_monitor_0 (node=dhcp-vm02, call=3, rc=5,
> status=complete): not installed
> 
> After a bit of digging through the syslog log entries, I've tracked down
> the following lines:
> 
> Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op:
> dhcpd_service_monitor_0 on dhcp-vm01 returned 5 (not installed) instead
> of the expected value: 7 (not running)
> Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard
> error - dhcpd_service_monitor_0 failed with rc=5: Preventing
> dhcpd_service from re-starting on dhcp-vm01
> Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op:
> dhcpd_service_monitor_0 on dhcp-vm02 returned 5 (not installed) instead
> of the expected value: 7 (not running)
> Dec  1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard
> error - dhcpd_service_monitor_0 failed with rc=5: Preventing
> dhcpd_service from re-starting on dhcp-vm02
> 
> Of which I then took a closer look at the monitor/status and
> validate-all functions in my script:
> 
> # Validate most critical parameters
> dhcpd_validate_all() {
>     ocf_log debug "Validating the ${OCF_RESKEY_dhcpd} binary exists."
>     check_binary ${OCF_RESKEY_dhcpd}
> 
>     if [ ocf_is_probe ] ; then
>         ocf_log debug "Validating that we are running in chrooted mode"
>         if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then
>             ocf_log debug "Chrooted mode is active, testing the chrooted
> path exists"
>             if ! test -e "${OCF_RESKEY_dhcpd_chrooted_path}"; then
>                 ocf_log err "Path ${OCF_RESKEY_dhcpd_chrooted_path} does
> not exist."
>                 return $OCF_ERR_INSTALLED
>             fi
> 
>             ocf_log debug "Checking to see if the
> ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} exists and
> is readable"
>             if test -n
> "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}" -a ! -r
> "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}"; then
>                 ocf_log err "Configuration file
> ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} doesn't
> exist"
>                 return $OCF_ERR_INSTALLED
>             fi
>         fi
>     else
>         ocf_log info "${OCF_RESKEY_dhcpd_chrooted_path} not readable
> during probe."
>         return $OCF_ERR_INSTALLED
>     fi
> 
>     ocf_log debug "Validating the ${OCF_RESKEY_dhcpd_user} user exists"
>     getent passwd ${OCF_RESKEY_dhcpd_user} >/dev/null 2>&1
>     if ! test $? -eq 0; then
>         ocf_log err "User ${OCF_RESKEY_dhcpd_user} doesn't exist";
>         return $OCF_ERR_INSTALLED
>     fi
> 
>     ocf_log debug "Validation complete, everything looks good."
> 
>     return $OCF_SUCCESS
> }
> 
> # dhcpd_status. Simple check of the status of dhcpd process by pidfile.
> dhcpd_status () {
>     if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then
>         ocf_pidfile_status
> ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_pidfile} >/dev/null
> 2>&1
>     else
>         ocf_pidfile_status ${OCF_RESKEY_dhcpd_pidfile} >/dev/null 2>&1
>     fi
> }
> 
> # dhcpd_monitor. Send a request to dhcpd and check response.
> dhcpd_monitor() {
>     local output
> 
>     ocf_log debug "Testing the state of the daemon itself"
>     ocf_log debug "OCF_NOT_RUNNING: $OCF_NOT_RUNNING"
>     if ! dhcpd_status
>     then
>         ocf_log info "The dhcpd process is not running"
>         return $OCF_NOT_RUNNING
>     fi
> 
>     return $OCF_SUCCESS
> }
> 
> I see nothing wrong that would tell me it is returning a "not installed"
> state during the validate or the monitoring phases.
> 
> This script is a bit large, and I am attaching it for reference to see
> if anyone can take a peak and point out anything I am overlooking. The
> script itself is using the same "concepts" that were defined in the
> named RA script, and blended with the official RA developers guide. It
> also borrows some code from the main DHCPD init script that ships with
> SLES 11. 
> 
> The script is not yet finalized in that some extra monitoring elements
> are "partially" there, but not yet fully worked, and chrooted mode is
> currently the only mode supported (why would you run a non-chrooted DHCP
> server?!!?). In addition acknowledgment of original authors is not yet
> in there, and will be added once I get closer to a more complete script.
> 
> Any help would be appreciated, and if additional details are needed, let
> me know and I will fill in any holes I can.
> Thanks
> Chris
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Custom resource agent script assistance

Reply via email to