Re: [Linux-ha-dev] [patch] conntrackd RA

2011-08-18 Thread Lars Ellenberg
On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote:
 Hi,
 
 While playing with the conntrackd agent on Debian Squeeze, I found out the 
 method used in the monitor action is not accurate and can sometimes yield 
 false results, believing conntrackd is running when it is not.

As in when?
Could that be resolved?

 I am instead checking the existence of the control socket and it has
 so far proved more stable.

If you kill -9 conntrackd (or contrackd should crash for some reason),
it will leave behind that socket.

So testing on the existence of that socket is in no way more reliable
than looking through the process table.

Maybe there is some ping method in the conntrack-tools?
Or something that could be used as such?

If not, maybe try a connect to that socket, using netcat/socat?
Just checking if the socket exists will not detect conntrackd crashes.


 --- conntrackd2011-08-18 12:12:36.807562142 +0200
 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd  2011-08-18 
 12:25:20.0 +0200
 @@ -111,8 +111,10 @@
  
  conntrackd_monitor() {
   rc=$OCF_NOT_RUNNING
 - # It does not write a PID file, so check with pgrep
 - pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
 +# It does not write a PID file, so check the socket exists after
 +# extracting its path from the configuration file
 +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ 
 *Path /) { print $2 } }' $OCF_RESKEY_config)
 +[ -S $conntrack_socket ]  rc=$OCF_SUCCESS
   if [ $rc -eq $OCF_SUCCESS ]; then
   # conntrackd is running 
   # now see if it acceppts queries




 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [patch] conntrackd RA

2011-08-18 Thread Lars Ellenberg
On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote:
 On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote:
  Hi,
  
  While playing with the conntrackd agent on Debian Squeeze, I found out the 
  method used in the monitor action is not accurate and can sometimes yield 
  false results, believing conntrackd is running when it is not.
 
 As in when?
 Could that be resolved?
 
  I am instead checking the existence of the control socket and it has
  so far proved more stable.
 
 If you kill -9 conntrackd (or contrackd should crash for some reason),
 it will leave behind that socket.
 
 So testing on the existence of that socket is in no way more reliable
 than looking through the process table.
 
 Maybe there is some ping method in the conntrack-tools?
 Or something that could be used as such?

Ah, my bad...
should have looked not at the patch only,
but at the RA in context.

That check if it accepts queries is done
immediately following this test.

So yes, your patch is good. Acked-by lars ;-)

  --- conntrackd  2011-08-18 12:12:36.807562142 +0200
  +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd2011-08-18 
  12:25:20.0 +0200
  @@ -111,8 +111,10 @@
   
   conntrackd_monitor() {
  rc=$OCF_NOT_RUNNING
  -   # It does not write a PID file, so check with pgrep
  -   pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
  +# It does not write a PID file, so check the socket exists after
  +# extracting its path from the configuration file
  +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ 
  *Path /) { print $2 } }' $OCF_RESKEY_config)
  +[ -S $conntrack_socket ]  rc=$OCF_SUCCESS
  if [ $rc -eq $OCF_SUCCESS ]; then
  # conntrackd is running 
  # now see if it acceppts queries

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [patch] conntrackd RA

2011-08-18 Thread Lars Ellenberg
On Thu, Aug 18, 2011 at 01:24:04PM +0200, Lars Ellenberg wrote:
 On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote:
  On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote:
   Hi,
   
   While playing with the conntrackd agent on Debian Squeeze, I found out 
   the 
   method used in the monitor action is not accurate and can sometimes yield 
   false results, believing conntrackd is running when it is not.
  
  As in when?
  Could that be resolved?
  
   I am instead checking the existence of the control socket and it has
   so far proved more stable.
  
  If you kill -9 conntrackd (or contrackd should crash for some reason),
  it will leave behind that socket.
  
  So testing on the existence of that socket is in no way more reliable
  than looking through the process table.
  
  Maybe there is some ping method in the conntrack-tools?
  Or something that could be used as such?
 
 Ah, my bad...
 should have looked not at the patch only,
 but at the RA in context.
 
 That check if it accepts queries is done
 immediately following this test.
 
 So yes, your patch is good. Acked-by lars ;-)

Oh, these monologues...

   --- conntrackd2011-08-18 12:12:36.807562142 +0200
   +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd  2011-08-18 
   12:25:20.0 +0200
   @@ -111,8 +111,10 @@

conntrackd_monitor() {
 rc=$OCF_NOT_RUNNING
   - # It does not write a PID file, so check with pgrep
   - pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
   +# It does not write a PID file, so check the socket exists after
   +# extracting its path from the configuration file
   +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ 
   *Path /) { print $2 } }' $OCF_RESKEY_config)

Is space really the only allowed white space there?
I guess the regex has to be changed to use [[:space:]]*

local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ 
{ if ($1 == Path) { print $2 } }' $OCF_RESKEY_config)

Maybe rather do it the other way round, see below.

   +[ -S $conntrack_socket ]  rc=$OCF_SUCCESS
 if [ $rc -eq $OCF_SUCCESS ]; then
 # conntrackd is running 
 # now see if it acceppts queries

(untested)
diff --git a/heartbeat/conntrackd b/heartbeat/conntrackd
--- a/heartbeat/conntrackd
+++ b/heartbeat/conntrackd
@@ -110,16 +110,17 @@ conntrackd_set_master_score() {
 }
 
 conntrackd_monitor() {
-   rc=$OCF_NOT_RUNNING
-   # It does not write a PID file, so check with pgrep
-   pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
-   if [ $rc -eq $OCF_SUCCESS ]; then
-   # conntrackd is running 
-   # now see if it acceppts queries
-   if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s  /dev/null 
21; then
+   # see if it acceppts queries
+   if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s  /dev/null 21; then
+   local conntrack_socket=$(awk 
'/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == Path) { print 
$2 } }' $OCF_RESKEY_config)
+   if test -S $conntrack_socket ; then
rc=$OCF_ERR_GENERIC
-   ocf_log err conntrackd is running but not responding 
to queries
+   ocf_log err conntrackd control socket exists, but not 
responding to queries
+   else
+   rc=$OCF_NOT_RUNNING
fi
+   else
+   rc=$OCF_SUCCESS
if conntrackd_is_master; then
rc=$OCF_RUNNING_MASTER
# Restore master setting on probes

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [patch] conntrackd RA

2011-08-18 Thread Albéric de Pertat
Le jeudi 18 août 2011 13:43:20, Lars Ellenberg a écrit :

  So yes, your patch is good. Acked-by lars ;-)

Oh thanks :)

 Oh, these monologues...
 
--- conntrackd  2011-08-18 12:12:36.807562142 +0200
+++ /usr/lib/ocf/resource.d/heartbeat/conntrackd2011-08-18
12:25:20.0 +0200 @@ -111,8 +111,10 @@

 conntrackd_monitor() {
 
rc=$OCF_NOT_RUNNING

-   # It does not write a PID file, so check with pgrep
-   pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
+# It does not write a PID file, so check the socket exists
after +# extracting its path from the configuration file
+local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~
/^ *Path /) { print $2 } }' $OCF_RESKEY_config)
 
 Is space really the only allowed white space there?
 I guess the regex has to be changed to use [[:space:]]*

Well I don't know about that but I guess we shouldn't take any chance. 
Unfortunately, awk doesn't know about [[:space:]]. I would have used [ \t\n] 
instead but I'm not sure blank lines in the middle of statements are allowed 
so, for the sake of clarity, I replaced them with [ \t] only.
-- 
Albéric de Pertat
ADELUX: http://www.adelux.fr
Tel: 01 40 86 45 81
GPG: http://www.adelux.fr/societe/gpg/alberic.asc
--- conntrackd	2011-08-18 12:12:36.807562142 +0200
+++ /usr/lib/ocf/resource.d/heartbeat/conntrackd	2011-08-18 14:14:48.0 +0200
@@ -111,8 +111,10 @@
 
 conntrackd_monitor() {
 	rc=$OCF_NOT_RUNNING
-	# It does not write a PID file, so check with pgrep
-	pgrep -f $OCF_RESKEY_binary  rc=$OCF_SUCCESS
+# It does not write a PID file, so check the socket exists after
+# extracting its path from the configuration file
+local conntrack_socket=$(awk '/^[ \t]*UNIX[ \t]*{/,/^[ \t]*}/ { if ($1 == Path) { print $2 } }' $OCF_RESKEY_config)
+[ -S $conntrack_socket ]  rc=$OCF_SUCCESS
 	if [ $rc -eq $OCF_SUCCESS ]; then
 		# conntrackd is running 
 		# now see if it acceppts queries


signature.asc
Description: This is a digitally signed message part.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/