Re: [Linux-ha-dev] [patch] conntrackd RA
On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote: Hi, While playing with the conntrackd agent on Debian Squeeze, I found out the method used in the monitor action is not accurate and can sometimes yield false results, believing conntrackd is running when it is not. As in when? Could that be resolved? I am instead checking the existence of the control socket and it has so far proved more stable. If you kill -9 conntrackd (or contrackd should crash for some reason), it will leave behind that socket. So testing on the existence of that socket is in no way more reliable than looking through the process table. Maybe there is some ping method in the conntrack-tools? Or something that could be used as such? If not, maybe try a connect to that socket, using netcat/socat? Just checking if the socket exists will not detect conntrackd crashes. --- conntrackd2011-08-18 12:12:36.807562142 +0200 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd 2011-08-18 12:25:20.0 +0200 @@ -111,8 +111,10 @@ conntrackd_monitor() { rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS +# It does not write a PID file, so check the socket exists after +# extracting its path from the configuration file +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ *Path /) { print $2 } }' $OCF_RESKEY_config) +[ -S $conntrack_socket ] rc=$OCF_SUCCESS if [ $rc -eq $OCF_SUCCESS ]; then # conntrackd is running # now see if it acceppts queries ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [patch] conntrackd RA
On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote: On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote: Hi, While playing with the conntrackd agent on Debian Squeeze, I found out the method used in the monitor action is not accurate and can sometimes yield false results, believing conntrackd is running when it is not. As in when? Could that be resolved? I am instead checking the existence of the control socket and it has so far proved more stable. If you kill -9 conntrackd (or contrackd should crash for some reason), it will leave behind that socket. So testing on the existence of that socket is in no way more reliable than looking through the process table. Maybe there is some ping method in the conntrack-tools? Or something that could be used as such? Ah, my bad... should have looked not at the patch only, but at the RA in context. That check if it accepts queries is done immediately following this test. So yes, your patch is good. Acked-by lars ;-) --- conntrackd 2011-08-18 12:12:36.807562142 +0200 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd2011-08-18 12:25:20.0 +0200 @@ -111,8 +111,10 @@ conntrackd_monitor() { rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS +# It does not write a PID file, so check the socket exists after +# extracting its path from the configuration file +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ *Path /) { print $2 } }' $OCF_RESKEY_config) +[ -S $conntrack_socket ] rc=$OCF_SUCCESS if [ $rc -eq $OCF_SUCCESS ]; then # conntrackd is running # now see if it acceppts queries -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [patch] conntrackd RA
On Thu, Aug 18, 2011 at 01:24:04PM +0200, Lars Ellenberg wrote: On Thu, Aug 18, 2011 at 12:58:21PM +0200, Lars Ellenberg wrote: On Thu, Aug 18, 2011 at 12:28:44PM +0200, Albéric de Pertat wrote: Hi, While playing with the conntrackd agent on Debian Squeeze, I found out the method used in the monitor action is not accurate and can sometimes yield false results, believing conntrackd is running when it is not. As in when? Could that be resolved? I am instead checking the existence of the control socket and it has so far proved more stable. If you kill -9 conntrackd (or contrackd should crash for some reason), it will leave behind that socket. So testing on the existence of that socket is in no way more reliable than looking through the process table. Maybe there is some ping method in the conntrack-tools? Or something that could be used as such? Ah, my bad... should have looked not at the patch only, but at the RA in context. That check if it accepts queries is done immediately following this test. So yes, your patch is good. Acked-by lars ;-) Oh, these monologues... --- conntrackd2011-08-18 12:12:36.807562142 +0200 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd 2011-08-18 12:25:20.0 +0200 @@ -111,8 +111,10 @@ conntrackd_monitor() { rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS +# It does not write a PID file, so check the socket exists after +# extracting its path from the configuration file +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ *Path /) { print $2 } }' $OCF_RESKEY_config) Is space really the only allowed white space there? I guess the regex has to be changed to use [[:space:]]* local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == Path) { print $2 } }' $OCF_RESKEY_config) Maybe rather do it the other way round, see below. +[ -S $conntrack_socket ] rc=$OCF_SUCCESS if [ $rc -eq $OCF_SUCCESS ]; then # conntrackd is running # now see if it acceppts queries (untested) diff --git a/heartbeat/conntrackd b/heartbeat/conntrackd --- a/heartbeat/conntrackd +++ b/heartbeat/conntrackd @@ -110,16 +110,17 @@ conntrackd_set_master_score() { } conntrackd_monitor() { - rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS - if [ $rc -eq $OCF_SUCCESS ]; then - # conntrackd is running - # now see if it acceppts queries - if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s /dev/null 21; then + # see if it acceppts queries + if ! $OCF_RESKEY_binary -C $OCF_RESKEY_config -s /dev/null 21; then + local conntrack_socket=$(awk '/^[[:space:]]*UNIX[[:space:]]*{/,/^[[:space:]]*}/ { if ($1 == Path) { print $2 } }' $OCF_RESKEY_config) + if test -S $conntrack_socket ; then rc=$OCF_ERR_GENERIC - ocf_log err conntrackd is running but not responding to queries + ocf_log err conntrackd control socket exists, but not responding to queries + else + rc=$OCF_NOT_RUNNING fi + else + rc=$OCF_SUCCESS if conntrackd_is_master; then rc=$OCF_RUNNING_MASTER # Restore master setting on probes -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [patch] conntrackd RA
Le jeudi 18 août 2011 13:43:20, Lars Ellenberg a écrit : So yes, your patch is good. Acked-by lars ;-) Oh thanks :) Oh, these monologues... --- conntrackd 2011-08-18 12:12:36.807562142 +0200 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd2011-08-18 12:25:20.0 +0200 @@ -111,8 +111,10 @@ conntrackd_monitor() { rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS +# It does not write a PID file, so check the socket exists after +# extracting its path from the configuration file +local conntrack_socket=$(awk '/^ *UNIX *{/,/^ *}/ { if ($0 ~ /^ *Path /) { print $2 } }' $OCF_RESKEY_config) Is space really the only allowed white space there? I guess the regex has to be changed to use [[:space:]]* Well I don't know about that but I guess we shouldn't take any chance. Unfortunately, awk doesn't know about [[:space:]]. I would have used [ \t\n] instead but I'm not sure blank lines in the middle of statements are allowed so, for the sake of clarity, I replaced them with [ \t] only. -- Albéric de Pertat ADELUX: http://www.adelux.fr Tel: 01 40 86 45 81 GPG: http://www.adelux.fr/societe/gpg/alberic.asc --- conntrackd 2011-08-18 12:12:36.807562142 +0200 +++ /usr/lib/ocf/resource.d/heartbeat/conntrackd 2011-08-18 14:14:48.0 +0200 @@ -111,8 +111,10 @@ conntrackd_monitor() { rc=$OCF_NOT_RUNNING - # It does not write a PID file, so check with pgrep - pgrep -f $OCF_RESKEY_binary rc=$OCF_SUCCESS +# It does not write a PID file, so check the socket exists after +# extracting its path from the configuration file +local conntrack_socket=$(awk '/^[ \t]*UNIX[ \t]*{/,/^[ \t]*}/ { if ($1 == Path) { print $2 } }' $OCF_RESKEY_config) +[ -S $conntrack_socket ] rc=$OCF_SUCCESS if [ $rc -eq $OCF_SUCCESS ]; then # conntrackd is running # now see if it acceppts queries signature.asc Description: This is a digitally signed message part. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/