On Thu, Aug 18, 2011 at 01:24:04PM +0200, Lars Ellenberg wrote: > On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote: > > On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote: > > > Hi, > > > > > > While playing with the conntrackd agent on Debian Squeeze, I found out > > > the > > > method used in the monitor action is not accurate and can sometimes yield > > > false results, believing conntrackd is running when it is not. > > > > As in when? > > Could that be resolved? > > > > > I am instead checking the existence of the control socket and it has > > > so far proved more stable. > > > > If you "kill -9" conntrackd (or contrackd should crash for some reason), > > it will leave behind that socket. > > > > So testing on the existence of that socket is in no way more reliable > > than looking through the process table. > > > > Maybe there is some "ping" method in the conntrack-tools? > > Or something that could be used as such? > > Ah, my bad... > should have looked not at the patch only, > but at the RA in context. > > That "check if it accepts queries" is done > immediately following this test. > > So yes, your patch is good. Acked-by lars ;-)
Oh, these monologues... > > > --- conntrackd 2011-08-18 12:12:36.807562142 +0200 > > > +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd 2011-08-18 > > > 12:25:20.000000000 +0200 > > > @@ -111,8 +111,10 @@ > > > > > > conntrackd_monitor() { > > > rc=$OCF_NOT_RUNNING > > > - # It does not write a PID file, so check with pgrep > > > - pgrep -f $OCF_RESKEY_binary && rc=$OCF_SUCCESS > > > + # It does not write a PID file, so check the socket exists after > > > + # extracting its path from the configuration file > > > + local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ > > > *Path /) { print $2 } }' $OCF_RESKEY_config) Is "space" really the only allowed white space there? I guess the regex has to be changed to use [[:space:]]* local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == "Path") { print $2 } }' $OCF_RESKEY_config) Maybe rather do it the other way round, see below. > > > + [ -S $conntrack_socket ] && rc=$OCF_SUCCESS > > > if [ "$rc" -eq "$OCF_SUCCESS" ]; then > > > # conntrackd is running > > > # now see if it acceppts queries (untested) diff --git a/heartbeat/conntrackd b/heartbeat/conntrackd --- a/heartbeat/conntrackd +++ b/heartbeat/conntrackd @@ -110,16 +110,17 @@ conntrackd_set_master_score() { } conntrackd_monitor() { - rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary && rc=$OCF_SUCCESS - if [ "$rc" -eq "$OCF_SUCCESS" ]; then - # conntrackd is running - # now see if it acceppts queries - if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s > /dev/null 2>&1; then + # see if it acceppts queries + if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s > /dev/null 2>&1; then + local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == "Path") { print $2 } }' $OCF_RESKEY_config) + if test -S $conntrack_socket ; then rc=$OCF_ERR_GENERIC - ocf_log err "conntrackd is running but not responding to queries" + ocf_log err "conntrackd control socket exists, but not responding to queries" + else + rc=$OCF_NOT_RUNNING fi + else + rc=$OCF_SUCCESS if conntrackd_is_master; then rc=$OCF_RUNNING_MASTER # Restore master setting on probes -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/