Hi Everyone, I'm in the process of configuring a 2 node + DRBD enabled DHCP cluster using the following packages:
SLES 11 SP1, with Pacemaker 1.1.6, corosync 1.4.2, and drbd 8.3.12. I know about DHCP's internal fail-over abilities, but after testing, it simply failed to remain viable as a more robust HA type cluster. As such I began working on this solution. For reference my current configuration looks like this: node dhcp-vm01 \ attributes standby="off" node dhcp-vm02 \ attributes standby="on" primitive DHCPFS ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/var/lib/dhcp" fstype="ext4" \ meta target-role="Started" primitive dhcp-cluster ocf:heartbeat:IPaddr2 \ params ip="xxx.xxx.xxx.xxx" cidr_netmask="32" \ op monitor interval="10s" primitive dhcpd_service ocf:heartbeat:dhcpd \ params dhcpd_config="/etc/dhcpd.conf" \ dhcpd_interface="eth0" \ op monitor interval="1min" \ meta target-role="Started" primitive dhcpdrbd ocf:linbit:drbd \ params drbd_resource="dhcpdata" \ op monitor interval="60s" ms DHCPData dhcpdrbd \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation dhcpd_service-with_cluster_ip inf: dhcpd_service dhcp-cluster colocation fs_on_drbd inf: DHCPFS DHCPData:Master order DHCP-after-dhcpfs inf: DHCPFS:promote dhcpd_service:start order dhcpfs_after_dhcpdata inf: DHCPData:promote DHCPFS:start property $id="cib-bootstrap-options" \ dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" The floating IP works without issue, as does the DRBD integration such that if I put a node into standby, the IP, DRBD master/slave and FS mounts all transfer correctly. Only the DHCP component itself is failing, in that it wont start properly from within pacemaker. I suspect it is due to having to write a new script as I could not find an existing DHCPD RA agent anywhere. I built my own based off the development guide for resource agents on the wiki. I've managed to get it to complete all the tests I need it to pass in the ocf-tester script: ocf-tester -n dhcpd -o monitor_client_interface=eth0 /usr/lib/ocf/resource.d/heartbeat/dhcpd Beginning tests for /usr/lib/ocf/resource.d/heartbeat/dhcpd... * Your agent does not support the notify action (optional) * Your agent does not support the demote action (optional) * Your agent does not support the promote action (optional) * Your agent does not support master/slave (optional) /usr/lib/ocf/resource.d/heartbeat/dhcpd passed all tests Additionally if I run each of the various options (start/stop/monitor/validate-all/status/meta-data) at the command line, they all work with out issue, and stop/start the DHCPD process as expected. dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp root 12516 0.0 0.1 4344 756 pts/3 S+ 17:16 0:00 grep dhcp dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # /usr/lib/ocf/resource.d/heartbeat/dhcpd start DEBUG: Validating the dhcpd binary exists. DEBUG: Validating that we are running in chrooted mode DEBUG: Chrooted mode is active, testing the chrooted path exists DEBUG: Checking to see if the /var/lib/dhcp//etc/dhcpd.conf exists and is readable DEBUG: Validating the dhcpd user exists DEBUG: Validation complete, everything looks good. DEBUG: Testing the state of the daemon itself DEBUG: OCF_NOT_RUNNING: 7 INFO: The dhcpd process is not running Internet Systems Consortium DHCP Server V3.1-ESV Copyright 2004-2010 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ WARNING: Host declarations are global. They are not limited to the scope you declared them in. Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were not specified in the config file Wrote 0 deleted host decls to leases file. Wrote 0 new dynamic host decls to leases file. Wrote 0 leases to leases file. Listening on LPF/eth0/00:0c:29:d7:64:99/SERVERS Sending on LPF/eth0/00:0c:29:d7:64:99/SERVERS Sending on Socket/fallback/fallback-net 0 INFO: dhcpd [chrooted] has started. DEBUG: Resource Agent Exit Status 0 DEBUG: default start returned 0 dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp dhcpd 12653 0.0 0.2 26636 1164 ? Ss 17:16 0:00 dhcpd -cf /etc/dhcpd.conf -chroot /var/lib/dhcp -lf /db/dhcpd.leases -user dhcpd -group nogroup -pf /var/run/dhcpd.pid root 12658 0.0 0.1 4344 752 pts/3 S+ 17:16 0:00 grep dhcp However, when I try to do the same from within pacemaker it fails to properly start up and I get the following error (crm_mon): Failed actions: dhcpd_service_monitor_0 (node=dhcp-vm01, call=3, rc=5, status=complete): not installed dhcpd_service_monitor_0 (node=dhcp-vm02, call=3, rc=5, status=complete): not installed After a bit of digging through the syslog log entries, I've tracked down the following lines: Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op: dhcpd_service_monitor_0 on dhcp-vm01 returned 5 (not installed) instead of the expected value: 7 (not running) Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard error - dhcpd_service_monitor_0 failed with rc=5: Preventing dhcpd_service from re-starting on dhcp-vm01 Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op: dhcpd_service_monitor_0 on dhcp-vm02 returned 5 (not installed) instead of the expected value: 7 (not running) Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard error - dhcpd_service_monitor_0 failed with rc=5: Preventing dhcpd_service from re-starting on dhcp-vm02 Of which I then took a closer look at the monitor/status and validate-all functions in my script: # Validate most critical parameters dhcpd_validate_all() { ocf_log debug "Validating the ${OCF_RESKEY_dhcpd} binary exists." check_binary ${OCF_RESKEY_dhcpd} if [ ocf_is_probe ] ; then ocf_log debug "Validating that we are running in chrooted mode" if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then ocf_log debug "Chrooted mode is active, testing the chrooted path exists" if ! test -e "${OCF_RESKEY_dhcpd_chrooted_path}"; then ocf_log err "Path ${OCF_RESKEY_dhcpd_chrooted_path} does not exist." return $OCF_ERR_INSTALLED fi ocf_log debug "Checking to see if the ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} exists and is readable" if test -n "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}" -a ! -r "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}"; then ocf_log err "Configuration file ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} doesn't exist" return $OCF_ERR_INSTALLED fi fi else ocf_log info "${OCF_RESKEY_dhcpd_chrooted_path} not readable during probe." return $OCF_ERR_INSTALLED fi ocf_log debug "Validating the ${OCF_RESKEY_dhcpd_user} user exists" getent passwd ${OCF_RESKEY_dhcpd_user} >/dev/null 2>&1 if ! test $? -eq 0; then ocf_log err "User ${OCF_RESKEY_dhcpd_user} doesn't exist"; return $OCF_ERR_INSTALLED fi ocf_log debug "Validation complete, everything looks good." return $OCF_SUCCESS } # dhcpd_status. Simple check of the status of dhcpd process by pidfile. dhcpd_status () { if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then ocf_pidfile_status ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_pidfile} >/dev/null 2>&1 else ocf_pidfile_status ${OCF_RESKEY_dhcpd_pidfile} >/dev/null 2>&1 fi } # dhcpd_monitor. Send a request to dhcpd and check response. dhcpd_monitor() { local output ocf_log debug "Testing the state of the daemon itself" ocf_log debug "OCF_NOT_RUNNING: $OCF_NOT_RUNNING" if ! dhcpd_status then ocf_log info "The dhcpd process is not running" return $OCF_NOT_RUNNING fi return $OCF_SUCCESS } I see nothing wrong that would tell me it is returning a "not installed" state during the validate or the monitoring phases. This script is a bit large, and I am attaching it for reference to see if anyone can take a peak and point out anything I am overlooking. The script itself is using the same "concepts" that were defined in the named RA script, and blended with the official RA developers guide. It also borrows some code from the main DHCPD init script that ships with SLES 11. The script is not yet finalized in that some extra monitoring elements are "partially" there, but not yet fully worked, and chrooted mode is currently the only mode supported (why would you run a non-chrooted DHCP server?!!?). In addition acknowledgment of original authors is not yet in there, and will be added once I get closer to a more complete script. Any help would be appreciated, and if additional details are needed, let me know and I will fill in any holes I can. Thanks Chris
dhcpd
Description: application/shellscript
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems