Hello Chris, On 12/01/2011 06:25 PM, Chris Bowlby wrote: > Hi Everyone, > > I'm in the process of configuring a 2 node + DRBD enabled DHCP cluster > using the following packages: > > SLES 11 SP1, with Pacemaker 1.1.6, corosync 1.4.2, and drbd 8.3.12. > > I know about DHCP's internal fail-over abilities, but after testing, it > simply failed to remain viable as a more robust HA type cluster. As such > I began working on this solution. For reference my current configuration > looks like this: > > node dhcp-vm01 \ > attributes standby="off" > node dhcp-vm02 \ > attributes standby="on" > primitive DHCPFS ocf:heartbeat:Filesystem \ > params device="/dev/drbd1" directory="/var/lib/dhcp" > fstype="ext4" \ > meta target-role="Started" > primitive dhcp-cluster ocf:heartbeat:IPaddr2 \ > params ip="xxx.xxx.xxx.xxx" cidr_netmask="32" \ > op monitor interval="10s" > primitive dhcpd_service ocf:heartbeat:dhcpd \ > params dhcpd_config="/etc/dhcpd.conf" \ > dhcpd_interface="eth0" \ > op monitor interval="1min" \ > meta target-role="Started" > primitive dhcpdrbd ocf:linbit:drbd \ > params drbd_resource="dhcpdata" \ > op monitor interval="60s" > ms DHCPData dhcpdrbd \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > colocation dhcpd_service-with_cluster_ip inf: dhcpd_service dhcp-cluster > colocation fs_on_drbd inf: DHCPFS DHCPData:Master > order DHCP-after-dhcpfs inf: DHCPFS:promote dhcpd_service:start > order dhcpfs_after_dhcpdata inf: DHCPData:promote DHCPFS:start
DHCPFS:promote ?? .. that action will never occour, so dhcpd_service will start whenever it likes ... typically not when it should ;-) ... remove that :promote ... and you miss a colocation between dhcpd_service and it's file system. I'd suggest using a group and colocate/order that with DRBD: group g_dhcp DHCPFS dhcpd_service dhcp-cluster .. or IP before dhcp if it needs to bind to it Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-ecb6baaf7fc091b023d6d4ba7e0fce26d32cf5c8" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > > The floating IP works without issue, as does the DRBD integration such > that if I put a node into standby, the IP, DRBD master/slave and FS > mounts all transfer correctly. Only the DHCP component itself is > failing, in that it wont start properly from within pacemaker. > > I suspect it is due to having to write a new script as I could not find > an existing DHCPD RA agent anywhere. I built my own based off the > development guide for resource agents on the wiki. I've managed to get > it to complete all the tests I need it to pass in the ocf-tester script: > > ocf-tester -n dhcpd -o > monitor_client_interface=eth0 /usr/lib/ocf/resource.d/heartbeat/dhcpd > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/dhcpd... > * Your agent does not support the notify action (optional) > * Your agent does not support the demote action (optional) > * Your agent does not support the promote action (optional) > * Your agent does not support master/slave (optional) > /usr/lib/ocf/resource.d/heartbeat/dhcpd passed all tests > > Additionally if I run each of the various options > (start/stop/monitor/validate-all/status/meta-data) at the command line, > they all work with out issue, and stop/start the DHCPD process as > expected. > > dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp > root 12516 0.0 0.1 4344 756 pts/3 S+ 17:16 0:00 grep > dhcp > dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat > # /usr/lib/ocf/resource.d/heartbeat/dhcpd start > DEBUG: Validating the dhcpd binary exists. > DEBUG: Validating that we are running in chrooted mode > DEBUG: Chrooted mode is active, testing the chrooted path exists > DEBUG: Checking to see if the /var/lib/dhcp//etc/dhcpd.conf exists and > is readable > DEBUG: Validating the dhcpd user exists > DEBUG: Validation complete, everything looks good. > DEBUG: Testing the state of the daemon itself > DEBUG: OCF_NOT_RUNNING: 7 > INFO: The dhcpd process is not running > Internet Systems Consortium DHCP Server V3.1-ESV > Copyright 2004-2010 Internet Systems Consortium. > All rights reserved. > For info, please visit https://www.isc.org/software/dhcp/ > WARNING: Host declarations are global. They are not limited to the > scope you declared them in. > Not searching LDAP since ldap-server, ldap-port and ldap-base-dn were > not specified in the config file > Wrote 0 deleted host decls to leases file. > Wrote 0 new dynamic host decls to leases file. > Wrote 0 leases to leases file. > Listening on LPF/eth0/00:0c:29:d7:64:99/SERVERS > Sending on LPF/eth0/00:0c:29:d7:64:99/SERVERS > Sending on Socket/fallback/fallback-net > 0 > INFO: dhcpd [chrooted] has started. > DEBUG: Resource Agent Exit Status 0 > DEBUG: default start returned 0 > dhcp-vm01:/usr/lib/ocf/resource.d/heartbeat # ps aux | grep dhcp > dhcpd 12653 0.0 0.2 26636 1164 ? Ss 17:16 0:00 dhcpd > -cf /etc/dhcpd.conf -chroot /var/lib/dhcp -lf /db/dhcpd.leases -user > dhcpd -group nogroup -pf /var/run/dhcpd.pid > root 12658 0.0 0.1 4344 752 pts/3 S+ 17:16 0:00 grep > dhcp > > However, when I try to do the same from within pacemaker it fails to > properly start up and I get the following error (crm_mon): > > Failed actions: > dhcpd_service_monitor_0 (node=dhcp-vm01, call=3, rc=5, > status=complete): not installed > dhcpd_service_monitor_0 (node=dhcp-vm02, call=3, rc=5, > status=complete): not installed > > After a bit of digging through the syslog log entries, I've tracked down > the following lines: > > Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op: > dhcpd_service_monitor_0 on dhcp-vm01 returned 5 (not installed) instead > of the expected value: 7 (not running) > Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard > error - dhcpd_service_monitor_0 failed with rc=5: Preventing > dhcpd_service from re-starting on dhcp-vm01 > Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: debug: unpack_rsc_op: > dhcpd_service_monitor_0 on dhcp-vm02 returned 5 (not installed) instead > of the expected value: 7 (not running) > Dec 1 16:21:22 dhcp-vm01 pengine: [31978]: notice: unpack_rsc_op: Hard > error - dhcpd_service_monitor_0 failed with rc=5: Preventing > dhcpd_service from re-starting on dhcp-vm02 > > Of which I then took a closer look at the monitor/status and > validate-all functions in my script: > > # Validate most critical parameters > dhcpd_validate_all() { > ocf_log debug "Validating the ${OCF_RESKEY_dhcpd} binary exists." > check_binary ${OCF_RESKEY_dhcpd} > > if [ ocf_is_probe ] ; then > ocf_log debug "Validating that we are running in chrooted mode" > if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then > ocf_log debug "Chrooted mode is active, testing the chrooted > path exists" > if ! test -e "${OCF_RESKEY_dhcpd_chrooted_path}"; then > ocf_log err "Path ${OCF_RESKEY_dhcpd_chrooted_path} does > not exist." > return $OCF_ERR_INSTALLED > fi > > ocf_log debug "Checking to see if the > ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} exists and > is readable" > if test -n > "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}" -a ! -r > "${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config}"; then > ocf_log err "Configuration file > ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_config} doesn't > exist" > return $OCF_ERR_INSTALLED > fi > fi > else > ocf_log info "${OCF_RESKEY_dhcpd_chrooted_path} not readable > during probe." > return $OCF_ERR_INSTALLED > fi > > ocf_log debug "Validating the ${OCF_RESKEY_dhcpd_user} user exists" > getent passwd ${OCF_RESKEY_dhcpd_user} >/dev/null 2>&1 > if ! test $? -eq 0; then > ocf_log err "User ${OCF_RESKEY_dhcpd_user} doesn't exist"; > return $OCF_ERR_INSTALLED > fi > > ocf_log debug "Validation complete, everything looks good." > > return $OCF_SUCCESS > } > > # dhcpd_status. Simple check of the status of dhcpd process by pidfile. > dhcpd_status () { > if ocf_is_true ${OCF_RESKEY_dhcpd_chrooted}; then > ocf_pidfile_status > ${OCF_RESKEY_dhcpd_chrooted_path}/${OCF_RESKEY_dhcpd_pidfile} >/dev/null > 2>&1 > else > ocf_pidfile_status ${OCF_RESKEY_dhcpd_pidfile} >/dev/null 2>&1 > fi > } > > # dhcpd_monitor. Send a request to dhcpd and check response. > dhcpd_monitor() { > local output > > ocf_log debug "Testing the state of the daemon itself" > ocf_log debug "OCF_NOT_RUNNING: $OCF_NOT_RUNNING" > if ! dhcpd_status > then > ocf_log info "The dhcpd process is not running" > return $OCF_NOT_RUNNING > fi > > return $OCF_SUCCESS > } > > I see nothing wrong that would tell me it is returning a "not installed" > state during the validate or the monitoring phases. > > This script is a bit large, and I am attaching it for reference to see > if anyone can take a peak and point out anything I am overlooking. The > script itself is using the same "concepts" that were defined in the > named RA script, and blended with the official RA developers guide. It > also borrows some code from the main DHCPD init script that ships with > SLES 11. > > The script is not yet finalized in that some extra monitoring elements > are "partially" there, but not yet fully worked, and chrooted mode is > currently the only mode supported (why would you run a non-chrooted DHCP > server?!!?). In addition acknowledgment of original authors is not yet > in there, and will be added once I get closer to a more complete script. > > Any help would be appreciated, and if additional details are needed, let > me know and I will fill in any holes I can. > Thanks > Chris > > > > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems