On Thu, Aug 18, 2011 at 01:24:04PM +0200, Lars Ellenberg wrote:
> On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote:
> > On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote:
> > > Hi,
> > > 
> > > While playing with the conntrackd agent on Debian Squeeze, I found out 
> > > the 
> > > method used in the monitor action is not accurate and can sometimes yield 
> > > false results, believing conntrackd is running when it is not.
> > 
> > As in when?
> > Could that be resolved?
> > 
> > > I am instead checking the existence of the control socket and it has
> > > so far proved more stable.
> > 
> > If you "kill -9" conntrackd (or contrackd should crash for some reason),
> > it will leave behind that socket.
> > 
> > So testing on the existence of that socket is in no way more reliable
> > than looking through the process table.
> > 
> > Maybe there is some "ping" method in the conntrack-tools?
> > Or something that could be used as such?
> 
> Ah, my bad...
> should have looked not at the patch only,
> but at the RA in context.
> 
> That "check if it accepts queries" is done
> immediately following this test.
> 
> So yes, your patch is good. Acked-by lars ;-)

Oh, these monologues...

> > > --- conntrackd    2011-08-18 12:12:36.807562142 +0200
> > > +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd  2011-08-18 
> > > 12:25:20.000000000 +0200
> > > @@ -111,8 +111,10 @@
> > >  
> > >  conntrackd_monitor() {
> > >   rc=$OCF_NOT_RUNNING
> > > - # It does not write a PID file, so check with pgrep
> > > - pgrep -f $OCF_RESKEY_binary && rc=$OCF_SUCCESS
> > > +        # It does not write a PID file, so check the socket exists after
> > > +        # extracting its path from the configuration file
> > > +        local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ 
> > > *Path /) { print $2 } }' $OCF_RESKEY_config)

Is "space" really the only allowed white space there?
I guess the regex has to be changed to use [[:space:]]*

local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ 
{ if ($1 == "Path") { print $2 } }' $OCF_RESKEY_config)

Maybe rather do it the other way round, see below.

> > > +        [ -S $conntrack_socket ] && rc=$OCF_SUCCESS
> > >   if [ "$rc" -eq "$OCF_SUCCESS" ]; then
> > >           # conntrackd is running 
> > >           # now see if it acceppts queries

(untested)
diff --git a/heartbeat/conntrackd b/heartbeat/conntrackd
--- a/heartbeat/conntrackd
+++ b/heartbeat/conntrackd
@@ -110,16 +110,17 @@ conntrackd_set_master_score() {
 }
 
 conntrackd_monitor() {
-       rc=$OCF_NOT_RUNNING
-       # It does not write a PID file, so check with pgrep
-       pgrep -f $OCF_RESKEY_binary && rc=$OCF_SUCCESS
-       if [ "$rc" -eq "$OCF_SUCCESS" ]; then
-               # conntrackd is running 
-               # now see if it acceppts queries
-               if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s > /dev/null 
2>&1; then
+       # see if it acceppts queries
+       if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s > /dev/null 2>&1; then
+               local conntrack_socket=$(awk 
'/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == "Path") { print 
$2 } }' $OCF_RESKEY_config)
+               if test -S $conntrack_socket ; then
                        rc=$OCF_ERR_GENERIC
-                       ocf_log err "conntrackd is running but not responding 
to queries"
+                       ocf_log err "conntrackd control socket exists, but not 
responding to queries"
+               else
+                       rc=$OCF_NOT_RUNNING
                fi
+       else
+               rc=$OCF_SUCCESS
                if conntrackd_is_master; then
                        rc=$OCF_RUNNING_MASTER
                        # Restore master setting on probes

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to