Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-28 Thread Jonathan Petersson
Nah just different decisions of design, either way as mentioned given
that your RA covers the whole daemon better I think it makes sense to
move ahead on it, I'll give it a spin next week.

On Wed, Oct 27, 2010 at 3:35 PM, Dominik Klein  wrote:
>> Maybe I just don't understand it, but isn't that broken?
>
> Jon and I just talked on IRC and my misunderstanding was that he
> intended to _start_ conntrackd via an init script and have the cluster
> only do the cache commiting/flushing and monitoring work.
>
> So from a cluster configuration point of view, conntrackd would only run
> on one node at a time, no clone resources at all.
>
> I want to say "my bad" but I'm not too sure I was all wrong here :)
>
> Dominik
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-27 Thread Dominik Klein
> Maybe I just don't understand it, but isn't that broken?

Jon and I just talked on IRC and my misunderstanding was that he
intended to _start_ conntrackd via an init script and have the cluster
only do the cache commiting/flushing and monitoring work.

So from a cluster configuration point of view, conntrackd would only run
on one node at a time, no clone resources at all.

I want to say "my bad" but I'm not too sure I was all wrong here :)

Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-27 Thread Dominik Klein
> can you please consult with Dominik Klein, who (unaware of your effort)
> incidentally also wrote an RA for conntrackd and posted it on this list
> a few days ago? It would be nice if the two of you could consolidate
> your efforts and come up with an updated patch. Maybe you guys could get
> together in #linux-ha on freenode one of these days?

I rather strongly think that the RA posted here is broken.

Michael and Jonathan suggested to just clone the resource and have it
run on multiple nodes at the same time. This is certainly necessary
because otherwise connection states would not be synchronized. But with
that RA: If you just commit and flush external caches during _start_,
how is the remaining node supposed to keep connections running that were
established through the failed machine? Since it is started already, it
will not commit its caches.

Maybe I just don't understand it, but isn't that broken?

Also the failback story, which would not work with that RA:

Am 10/15/2010 01:43 PM, schrieb Dominik Klein:
> The main challenge in this RA was the failback part. Say one system goes
> down completely. Then it loses the kernel connection tracking table and
> the external cache. Once it comes back, it will receive updates for new
> connections that are initiated through the master, but it will neither
> be sent the complete tracking table of the current master, nor can it
> request this (that's how I understand and tested conntrackd works,
> please correct me if I'm wrong :)).
>
> This may be acceptable for short-lived connections and configurations
> where there is no preferred master system, but it does become a problem
> if you have either of those.
>
> So my approach is to send a so called "bulk update" in two situations:
>
> a) in the notify pre promote call, if the local machine is not the
> machine to be promoted
> This part is responsible for sending the update to a preferred master
> that had previously failed (failback).
> b) in the notify post start call, if the local machine is the master
> This part is responsible for sending the update to a previously failed
> machine that re-joins the cluster but is not to be promoted right away.

Regards
Dominik
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-19 Thread Florian Haas
Jonathan,

can you please consult with Dominik Klein, who (unaware of your effort)
incidentally also wrote an RA for conntrackd and posted it on this list
a few days ago? It would be nice if the two of you could consolidate
your efforts and come up with an updated patch. Maybe you guys could get
together in #linux-ha on freenode one of these days?

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-19 Thread Jonathan Petersson
Hi again,

Just a quick question for a pointer, I've been troubleshooting the
part about having the RA return OCF_NOT_RUNNING when the resource is
stopped, however what parameter should I verify to check whether it's
stopped or not?

Thanks

On Tue, Oct 12, 2010 at 3:45 PM, Jonathan Petersson
 wrote:
> Unfortunately not yet, I've been busy with some other projects but
> I'll try to put some time into it later this week.
>
> On Tue, Oct 12, 2010 at 11:58 AM, Florian Haas  
> wrote:
>> Jonathan, is there an updated revision of this RA available? Have you
>> progressed in making it a master/slave capable RA?
>>
>> Cheers,
>> Florian
>>
>>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>>
>>
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-12 Thread Jonathan Petersson
Unfortunately not yet, I've been busy with some other projects but
I'll try to put some time into it later this week.

On Tue, Oct 12, 2010 at 11:58 AM, Florian Haas  wrote:
> Jonathan, is there an updated revision of this RA available? Have you
> progressed in making it a master/slave capable RA?
>
> Cheers,
> Florian
>
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-10-12 Thread Florian Haas
Jonathan, is there an updated revision of this RA available? Have you
progressed in making it a master/slave capable RA?

Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-30 Thread Florian Haas
Hello Jonathan,

a few more comments below.

>> - conntrackd_start doesn't produce the correct error code if the binary
>> isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}".
> 
> Added to conntrackd_validate

The way you did it, that change will break probes.

You'll probably want to use "ocf_is_probe || check_binary
${OCF_RESKEY_binary}" instead.

>> - I don't understand how the conntrack daemon is actually started.
>> Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start?
> 
> The OCF resource doesn't really handle whether the daemon should start
> or not (see earlier discussions in this on the thread), I would
> probably add the daemon as a LSB resource rather.

No, you missed the point there. It's perfectly fine (and indeed,
expected) of the RA to start and stop the daemon process -- it's just
that you should do that in the start and stop options, not in monitor
like you originally did.

>> - Please run your RA through ocf-tester.
> 
> deb-fw1:~# ocf-tester -L -n conntrackd
> /usr/lib/ocf/resource.d/heartbeat/conntrackd
> Using lrmd/lrmadmin for all tests
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd...
> * rc=0: Monitoring a stopped resource should return 7
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * rc=0: Monitoring a stopped resource should return 7
> * rc=0: Monitoring a stopped resource should return 7
> * rc=0: Monitoring a stopped resource should return 7
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests
> 
> Not 100% sure codewise what I should change.

First thing would be to fix your monitor action so that when the
resource is stopped, it properly returns $OCF_NOT_RUNNING.

> Maybe it would make sense to add master/slave to handle the
> state-transition and have start/stop handle whether the daemon is
> alive or not. However I'm not entirely sure how the crm configuration
> statement for this should look given that the daemon should be started
> on both nodes running in either master or slave mode, do you have an
> example for this?

Rather simple:

primitive p_conntrackd ocf:heartbeat:Conntrackd \
  op monitor interval=30s
ms ms_conntrackd p_conntrackd

Hope this helps,
Cheers,
Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-30 Thread Jonathan Petersson
Great, I'll give it a try thanks for the input

On Thu, Sep 30, 2010 at 1:09 PM, Cnut Jansen  wrote:
> Am 30.09.2010 09:49, schrieb Jonathan Petersson:
>> deb-fw1:~# ocf-tester -L -n conntrackd
>> /usr/lib/ocf/resource.d/heartbeat/conntrackd
>> Using lrmd/lrmadmin for all tests
>> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd...
>> * rc=0: Monitoring a stopped resource should return 7
>> * Your agent does not support the notify action (optional)
>> * Your agent does not support the demote action (optional)
>> * Your agent does not support the promote action (optional)
>> * Your agent does not support master/slave (optional)
>> * rc=0: Monitoring a stopped resource should return 7
>> * rc=0: Monitoring a stopped resource should return 7
>> * rc=0: Monitoring a stopped resource should return 7
>> Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests
>>
>> Not 100% sure codewise what I should change.
>>
>> Maybe it would make sense to add master/slave to handle the
>> state-transition and have start/stop handle whether the daemon is
>> alive or not. However I'm not entirely sure how the crm configuration
>> statement for this should look given that the daemon should be started
>> on both nodes running in either master or slave mode, do you have an
>> example for this?
>
> Hi,
>
> without having had a look onto your RA and its necessary parameters etc.
> at all yet:
> I had similiar issues with my Tomcat6- and Apache-RAs, though they
> worked without problems in the cluster. When I had the time and ambition
> to finally also make them ocf-tester-compliant (meanwhile they are
> anyway, without special additional handlers for the ocf-tester), I could
> track problems down to various timing issues. Not only that back at that
> time I still used start-delay for the monitor-op in the cluster (because
> back then I didn't check and wait for i.e. Tomcat being operational in
> start-action respectively not anymore and all files being closed in
> stop-action yet), but also exspecially that ocf-tester didn't set i.e. a
> $OCF_RESKEY_CRM_meta_timeout, on which my RA back then still depended on
> for its monitor-action (now it's not anymore, but back then I was still
> eager about allways cleanly exiting with at least an OCF_ERR_* too,
> instead of just simply letting it time out and having Pacemaker handle
> that situation (-; ).
> Also - since I don't know about your RA's parameters, but at least for
> mine it helped, due to some not-up-to-date-anymore-defaults - it might
> be necessary to tell ocf-tester to pass over some OCF-parameters, to
> have them available as $OCF_RESKEY_*. You do that with the "-o"-option
> to ocf-tester, like
>        -o log_level="debug"
> to have ocf-tester set an env.-var. $OCF_RESKEY_log_level with the value
> "debug".
>
> This was my complete command for calling ocf-tester back then, long time
> ago (i.e. still includes my switch for special ocf-tester-handler,
> etc.): (-;
> ocf-tester -n testTom -o log_debug="/tmp/debug-tomcat.log" -o
> monitor_timeoutPerTry="5" -o monitor_precheckDB="1" -o log_level="debug"
> -o stopith_enabled="1" -o
> monitor_url="http://localhost:8080/opencms/opencms/test/cluster.html"; -o
> agent_timebuffer="4000" -o ocftester="y" /usr/lib/ocf/resource.d/cj/tomcat
>
> Hope that helps you a little on making ocf-tester accept your RA too. (-;
>
> Cnut Jansen
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-30 Thread Cnut Jansen
Am 30.09.2010 09:49, schrieb Jonathan Petersson:
> deb-fw1:~# ocf-tester -L -n conntrackd
> /usr/lib/ocf/resource.d/heartbeat/conntrackd
> Using lrmd/lrmadmin for all tests
> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd...
> * rc=0: Monitoring a stopped resource should return 7
> * Your agent does not support the notify action (optional)
> * Your agent does not support the demote action (optional)
> * Your agent does not support the promote action (optional)
> * Your agent does not support master/slave (optional)
> * rc=0: Monitoring a stopped resource should return 7
> * rc=0: Monitoring a stopped resource should return 7
> * rc=0: Monitoring a stopped resource should return 7
> Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests
> 
> Not 100% sure codewise what I should change.
> 
> Maybe it would make sense to add master/slave to handle the
> state-transition and have start/stop handle whether the daemon is
> alive or not. However I'm not entirely sure how the crm configuration
> statement for this should look given that the daemon should be started
> on both nodes running in either master or slave mode, do you have an
> example for this?

Hi,

without having had a look onto your RA and its necessary parameters etc.
at all yet:
I had similiar issues with my Tomcat6- and Apache-RAs, though they
worked without problems in the cluster. When I had the time and ambition
to finally also make them ocf-tester-compliant (meanwhile they are
anyway, without special additional handlers for the ocf-tester), I could
track problems down to various timing issues. Not only that back at that
time I still used start-delay for the monitor-op in the cluster (because
back then I didn't check and wait for i.e. Tomcat being operational in
start-action respectively not anymore and all files being closed in
stop-action yet), but also exspecially that ocf-tester didn't set i.e. a
$OCF_RESKEY_CRM_meta_timeout, on which my RA back then still depended on
for its monitor-action (now it's not anymore, but back then I was still
eager about allways cleanly exiting with at least an OCF_ERR_* too,
instead of just simply letting it time out and having Pacemaker handle
that situation (-; ).
Also - since I don't know about your RA's parameters, but at least for
mine it helped, due to some not-up-to-date-anymore-defaults - it might
be necessary to tell ocf-tester to pass over some OCF-parameters, to
have them available as $OCF_RESKEY_*. You do that with the "-o"-option
to ocf-tester, like
-o log_level="debug"
to have ocf-tester set an env.-var. $OCF_RESKEY_log_level with the value
"debug".

This was my complete command for calling ocf-tester back then, long time
ago (i.e. still includes my switch for special ocf-tester-handler,
etc.): (-;
ocf-tester -n testTom -o log_debug="/tmp/debug-tomcat.log" -o
monitor_timeoutPerTry="5" -o monitor_precheckDB="1" -o log_level="debug"
-o stopith_enabled="1" -o
monitor_url="http://localhost:8080/opencms/opencms/test/cluster.html"; -o
agent_timebuffer="4000" -o ocftester="y" /usr/lib/ocf/resource.d/cj/tomcat

Hope that helps you a little on making ocf-tester accept your RA too. (-;

Cnut Jansen

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-30 Thread Jonathan Petersson
Updated: http://pastebin.com/VC03VXvi

See answers inline

On Thu, Sep 30, 2010 at 9:18 AM, Florian Haas  wrote:
> Hello Jonathan,
>
> A few other comments:
>
> - As stated before, numeric comparisons ought to be numeric comparisons:
>
>> if [ $? =  $OCF_SUCCESS ]; then
>
> Use -eq.
>
> - You still have a bunch like these around:
>
>>     # shorten kernel conntrack timers to remove the zombie entries.
>>     $CONNTRACKD -t
>>     if [ $? -eq 1 ]
>>     then
>>         return $OCF_ERR_GENERIC
>>     fi

Missed that one.

>
> Use ocf_run.
>
> - conntrackd_start doesn't produce the correct error code if the binary
> isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}".

Added to conntrackd_validate

>
> - What's the "asd" doing here?
>
>>     # flush the internal and the external caches
>>     ocf_run $CONNTRACKD -f || exit "asd".$OCF_ERR_GENERIC
>

Typo from debug, fixed.

> - I don't understand how the conntrack daemon is actually started.
> Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start?

The OCF resource doesn't really handle whether the daemon should start
or not (see earlier discussions in this on the thread), I would
probably add the daemon as a LSB resource rather.

>
> - You're still advertising migrate_from and migrate_to through RA
> metadata and the usage message, please remove those.

Fixed.

>
> - All your parameter definitions come with "contect" in place of
> "content", please fix that typo.

Fixed

>
> - Please run your RA through ocf-tester.

deb-fw1:~# ocf-tester -L -n conntrackd
/usr/lib/ocf/resource.d/heartbeat/conntrackd
Using lrmd/lrmadmin for all tests
Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd...
* rc=0: Monitoring a stopped resource should return 7
* Your agent does not support the notify action (optional)
* Your agent does not support the demote action (optional)
* Your agent does not support the promote action (optional)
* Your agent does not support master/slave (optional)
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
* rc=0: Monitoring a stopped resource should return 7
Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests

Not 100% sure codewise what I should change.

Maybe it would make sense to add master/slave to handle the
state-transition and have start/stop handle whether the daemon is
alive or not. However I'm not entirely sure how the crm configuration
statement for this should look given that the daemon should be started
on both nodes running in either master or slave mode, do you have an
example for this?
>
> Cheers,
> Florian
>
>
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-30 Thread Florian Haas
Hello Jonathan,

A few other comments:

- As stated before, numeric comparisons ought to be numeric comparisons:

> if [ $? =  $OCF_SUCCESS ]; then

Use -eq.

- You still have a bunch like these around:

> # shorten kernel conntrack timers to remove the zombie entries.
> $CONNTRACKD -t
> if [ $? -eq 1 ]
> then
> return $OCF_ERR_GENERIC
> fi

Use ocf_run.

- conntrackd_start doesn't produce the correct error code if the binary
isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}".

- What's the "asd" doing here?

> # flush the internal and the external caches
> ocf_run $CONNTRACKD -f || exit "asd".$OCF_ERR_GENERIC

- I don't understand how the conntrack daemon is actually started.
Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start?

- You're still advertising migrate_from and migrate_to through RA
metadata and the usage message, please remove those.

- All your parameter definitions come with "contect" in place of
"content", please fix that typo.

- Please run your RA through ocf-tester.

Cheers,
Florian




signature.asc
Description: OpenPGP digital signature
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-29 Thread Jonathan Petersson
I think I've covered most of the input I've gotten so far, in lack of
git access internally here's another pastebin. Please let me know if
there's any additional changes that should be made.

http://pastebin.com/QMuyTRWB

On Tue, Sep 28, 2010 at 10:28 PM, Jonathan Petersson
 wrote:
> Great feedback, thanks a bunch, I'll get to it making the mods recommended 
> asap.
>
> On Tue, Sep 28, 2010 at 10:23 PM, Florian Haas  
> wrote:
>> Hi Jonathan,
>>
>> as this discussion applies to a new RA, let's move it to the -dev list.
>> Very helpful contribution, thanks a lot. A few comments below.
>>
>>> # Fill in some defaults if no values are specified
>>> OCF_RESKEY_bin_default="/usr/sbin/conntrackd"
>>> OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf"
>>> OCF_RESKEY_lck_default="/var/lock/conntrack.lock"
>>
>> Please use self-explanatory parameter names wherever possible; makes
>> life a whole lot easier for your users. I'd suggest "binary", "config",
>> and "lockfile" instead of "bin", "cfg", and "lck".
>>
>>> CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}"
>>
>> This assignment should come _after_ you initialize those variables with
>> their defaults.
>>
>>> : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}}
>>> : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}}
>>> : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}}
>>
>>
>>> meta_data() {
>>>       cat <>
>> This is a pet peeve, but would you mind using EOF for the here document
>> marker to make emacs users happy? :)
>>
>>> 
>>> 
>>> 
>>> 1.0
>>
>> Really minor issue here, but the version numbers should agree.
>>
>>>
>>> 
>>> This is a Conntrackd resource to manage primary and secondary state between 
>>> two firewalls in a cluster.
>>> 
>>> Manages primary/backup conntrackd state
>>>
>>> 
>>> 
>>
>> Nope. That one's definitely not unique.
>>
>>> 
>>> Location of conntrackd binary.
>>> 
>>> Conntrackd bin
>>> 
>>
>> You have a variable defined for the default; it's wise to use it here.
>>
>>> 
>>>
>>> 
>>
>> That one isn't unique either, I think.
>>
>>> 
>>> Location of conntrackd configuration file.
>>> 
>>> Conntrackd config
>>> 
>>
>> As above.
>>
>>> 
>>>
>>> 
>>
>> This one _might_ be unique, though I really don't think so.
>>
>>> 
>>> Location of conntrackd lock-file.
>>> 
>>> Conntrackd lock-file
>>> 
>>
>> Again, as above.
>>
>>>     # Call monitor to verify that conntrackd is running
>>>     conntrackd_monitor
>>>
>>>     if [ $? =  $OCF_SUCCESS ]; then
>>
>> Use -eq instead of = for numeric comparisons.
>>
>>>
>>>       # commit the external cache into the kernel table
>>>       $CONNTRACKD -c
>>>       if [ $? -eq 1 ]
>>>       then
>>>           return $OCF_ERR_GENERIC
>>
>> As a general rule, there is hardly ever any need to _return_ error
>> codes, you could just exit on them. And, whenever you error out, do so
>> after logging an error message with ocf_log err. In your case, it's
>> likely that you want to run the conntrackd command, capture its output,
>> and log it on error. All of which is easily achieved with ocf_run, like so:
>>
>> ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC
>>
>> You can probably apply that to most of your invocations of $CONNTRACKD.
>>
>>> conntrackd_monitor() {
>>>
>>>     # Define conntrackd_pid variable
>>>     local conntrackd_pid=`pidof ${OCF_RESKEY_bin}`
>>
>> Bashism. If you decĺare the resource agent as bourne shell compatible
>> (#!/bin/sh), this should be:
>>
>> local conntrackd_pid
>> conntrackd_pid=`pidof ${OCF_RESKEY_bin}`
>>
>> ... or you force bash with #!/bin/bash.
>>
>>>     # Check for conntrackd lock-file
>>>     if [ -f $OCF_RESKEY_lck ]; then
>>>
>>>       # Check for conntrackd pid
>>>       if [ $conntrackd_pid ]; then
>>
>> I think this is another bashism, Bourne shell would expect [ -n
>> "$conntrackd_pid" ].
>>
>>>           # Successfull if lock and pid exists
>>>           return $OCF_SUCCESS
>>>       else
>>>           # Error if pid exists but pid isn't running
>>>           return $OCF_ERR_GENERIC
>>>       fi
>>>     else
>>>       # False if lock and pid missing
>>>       $OCF_NOT_RUNNING
>>>
>>>       # Start conntrackd daemon
>>>       $CONNTRACKD -d
>>
>> No. It's not the monitor action's job to restart anything. If monitor
>> exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that
>> resource. And it happens to be good at that job. :)
>>
>>> conntrackd_validate() {
>>>
>>>     # Check if conntrackd binary exists
>>>     check_binary ${OCF_RESKEY_bin}
>>>
>>>     if [ $? != 0 ]; then
>>>       return $OCF_ERR_ARGS
>>>     fi
>>
>> Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which
>> is the correct error code here) if the binary isn't found.
>>
>>>
>>>     # Check if conntrackd config exists
>>>     if [ ! -f ${OCF_RESKEY_cfg} ]; then
>>>       return $OCF_ERR_ARGS
>>
>> This should be $OCF_ERR_INSTALLED.
>>
>>>     fi
>>>
>>>     return $OCF_SUCCESS
>>> }
>>>
>>> case $__OCF_ACTION in
>>> meta-data)    meta_data
>>>               exit $OCF_SUCCESS
>>>   

Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-28 Thread Jonathan Petersson
Great feedback, thanks a bunch, I'll get to it making the mods recommended asap.

On Tue, Sep 28, 2010 at 10:23 PM, Florian Haas  wrote:
> Hi Jonathan,
>
> as this discussion applies to a new RA, let's move it to the -dev list.
> Very helpful contribution, thanks a lot. A few comments below.
>
>> # Fill in some defaults if no values are specified
>> OCF_RESKEY_bin_default="/usr/sbin/conntrackd"
>> OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf"
>> OCF_RESKEY_lck_default="/var/lock/conntrack.lock"
>
> Please use self-explanatory parameter names wherever possible; makes
> life a whole lot easier for your users. I'd suggest "binary", "config",
> and "lockfile" instead of "bin", "cfg", and "lck".
>
>> CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}"
>
> This assignment should come _after_ you initialize those variables with
> their defaults.
>
>> : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}}
>> : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}}
>> : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}}
>
>
>> meta_data() {
>>       cat <
> This is a pet peeve, but would you mind using EOF for the here document
> marker to make emacs users happy? :)
>
>> 
>> 
>> 
>> 1.0
>
> Really minor issue here, but the version numbers should agree.
>
>>
>> 
>> This is a Conntrackd resource to manage primary and secondary state between 
>> two firewalls in a cluster.
>> 
>> Manages primary/backup conntrackd state
>>
>> 
>> 
>
> Nope. That one's definitely not unique.
>
>> 
>> Location of conntrackd binary.
>> 
>> Conntrackd bin
>> 
>
> You have a variable defined for the default; it's wise to use it here.
>
>> 
>>
>> 
>
> That one isn't unique either, I think.
>
>> 
>> Location of conntrackd configuration file.
>> 
>> Conntrackd config
>> 
>
> As above.
>
>> 
>>
>> 
>
> This one _might_ be unique, though I really don't think so.
>
>> 
>> Location of conntrackd lock-file.
>> 
>> Conntrackd lock-file
>> 
>
> Again, as above.
>
>>     # Call monitor to verify that conntrackd is running
>>     conntrackd_monitor
>>
>>     if [ $? =  $OCF_SUCCESS ]; then
>
> Use -eq instead of = for numeric comparisons.
>
>>
>>       # commit the external cache into the kernel table
>>       $CONNTRACKD -c
>>       if [ $? -eq 1 ]
>>       then
>>           return $OCF_ERR_GENERIC
>
> As a general rule, there is hardly ever any need to _return_ error
> codes, you could just exit on them. And, whenever you error out, do so
> after logging an error message with ocf_log err. In your case, it's
> likely that you want to run the conntrackd command, capture its output,
> and log it on error. All of which is easily achieved with ocf_run, like so:
>
> ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC
>
> You can probably apply that to most of your invocations of $CONNTRACKD.
>
>> conntrackd_monitor() {
>>
>>     # Define conntrackd_pid variable
>>     local conntrackd_pid=`pidof ${OCF_RESKEY_bin}`
>
> Bashism. If you decĺare the resource agent as bourne shell compatible
> (#!/bin/sh), this should be:
>
> local conntrackd_pid
> conntrackd_pid=`pidof ${OCF_RESKEY_bin}`
>
> ... or you force bash with #!/bin/bash.
>
>>     # Check for conntrackd lock-file
>>     if [ -f $OCF_RESKEY_lck ]; then
>>
>>       # Check for conntrackd pid
>>       if [ $conntrackd_pid ]; then
>
> I think this is another bashism, Bourne shell would expect [ -n
> "$conntrackd_pid" ].
>
>>           # Successfull if lock and pid exists
>>           return $OCF_SUCCESS
>>       else
>>           # Error if pid exists but pid isn't running
>>           return $OCF_ERR_GENERIC
>>       fi
>>     else
>>       # False if lock and pid missing
>>       $OCF_NOT_RUNNING
>>
>>       # Start conntrackd daemon
>>       $CONNTRACKD -d
>
> No. It's not the monitor action's job to restart anything. If monitor
> exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that
> resource. And it happens to be good at that job. :)
>
>> conntrackd_validate() {
>>
>>     # Check if conntrackd binary exists
>>     check_binary ${OCF_RESKEY_bin}
>>
>>     if [ $? != 0 ]; then
>>       return $OCF_ERR_ARGS
>>     fi
>
> Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which
> is the correct error code here) if the binary isn't found.
>
>>
>>     # Check if conntrackd config exists
>>     if [ ! -f ${OCF_RESKEY_cfg} ]; then
>>       return $OCF_ERR_ARGS
>
> This should be $OCF_ERR_INSTALLED.
>
>>     fi
>>
>>     return $OCF_SUCCESS
>> }
>>
>> case $__OCF_ACTION in
>> meta-data)    meta_data
>>               exit $OCF_SUCCESS
>>               ;;
>> start)                conntrackd_start;;
>> stop)         conntrackd_stop;;
>> monitor)      conntrackd_monitor;;
>> migrate_to)   ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to 
>> ${OCF_RESKEY_CRM_meta_migrate_to}."
>>               conntrackd_stop
>>               ;;
>> migrate_from) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to 
>> ${OCF_RESKEY_CRM_meta_migrated_from}."
>>               conntrackd_start
>
> Migrate actions are really

Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack

2010-09-28 Thread Florian Haas
Hi Jonathan,

as this discussion applies to a new RA, let's move it to the -dev list.
Very helpful contribution, thanks a lot. A few comments below.

> # Fill in some defaults if no values are specified
> OCF_RESKEY_bin_default="/usr/sbin/conntrackd"
> OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf"
> OCF_RESKEY_lck_default="/var/lock/conntrack.lock"

Please use self-explanatory parameter names wherever possible; makes
life a whole lot easier for your users. I'd suggest "binary", "config",
and "lockfile" instead of "bin", "cfg", and "lck".

> CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}"

This assignment should come _after_ you initialize those variables with
their defaults.

> : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}}
> : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}}
> : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}}


> meta_data() {
>   cat < 
> 
> 
> 1.0

Really minor issue here, but the version numbers should agree.

> 
> 
> This is a Conntrackd resource to manage primary and secondary state between 
> two firewalls in a cluster.
> 
> Manages primary/backup conntrackd state
> 
> 
> 

Nope. That one's definitely not unique.

> 
> Location of conntrackd binary.
> 
> Conntrackd bin
> 

You have a variable defined for the default; it's wise to use it here.

> 
> 
> 

That one isn't unique either, I think.

> 
> Location of conntrackd configuration file.
> 
> Conntrackd config
> 

As above.

> 
> 
> 

This one _might_ be unique, though I really don't think so.

> 
> Location of conntrackd lock-file.
> 
> Conntrackd lock-file
> 

Again, as above.

> # Call monitor to verify that conntrackd is running
> conntrackd_monitor
> 
> if [ $? =  $OCF_SUCCESS ]; then

Use -eq instead of = for numeric comparisons.

>   
>   # commit the external cache into the kernel table
>   $CONNTRACKD -c
>   if [ $? -eq 1 ]
>   then
>   return $OCF_ERR_GENERIC

As a general rule, there is hardly ever any need to _return_ error
codes, you could just exit on them. And, whenever you error out, do so
after logging an error message with ocf_log err. In your case, it's
likely that you want to run the conntrackd command, capture its output,
and log it on error. All of which is easily achieved with ocf_run, like so:

ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC

You can probably apply that to most of your invocations of $CONNTRACKD.

> conntrackd_monitor() {
> 
> # Define conntrackd_pid variable
> local conntrackd_pid=`pidof ${OCF_RESKEY_bin}`

Bashism. If you decĺare the resource agent as bourne shell compatible
(#!/bin/sh), this should be:

local conntrackd_pid
conntrackd_pid=`pidof ${OCF_RESKEY_bin}`

... or you force bash with #!/bin/bash.

> # Check for conntrackd lock-file
> if [ -f $OCF_RESKEY_lck ]; then
>   
>   # Check for conntrackd pid
>   if [ $conntrackd_pid ]; then

I think this is another bashism, Bourne shell would expect [ -n
"$conntrackd_pid" ].

>   # Successfull if lock and pid exists
>   return $OCF_SUCCESS
>   else
>   # Error if pid exists but pid isn't running
>   return $OCF_ERR_GENERIC
>   fi
> else
>   # False if lock and pid missing
>   $OCF_NOT_RUNNING
>   
>   # Start conntrackd daemon
>   $CONNTRACKD -d

No. It's not the monitor action's job to restart anything. If monitor
exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that
resource. And it happens to be good at that job. :)

> conntrackd_validate() {
> 
> # Check if conntrackd binary exists
> check_binary ${OCF_RESKEY_bin}
> 
> if [ $? != 0 ]; then
>   return $OCF_ERR_ARGS
> fi

Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which
is the correct error code here) if the binary isn't found.

> 
> # Check if conntrackd config exists
> if [ ! -f ${OCF_RESKEY_cfg} ]; then
>   return $OCF_ERR_ARGS

This should be $OCF_ERR_INSTALLED.

> fi
> 
> return $OCF_SUCCESS
> }
> 
> case $__OCF_ACTION in
> meta-data)meta_data
>   exit $OCF_SUCCESS
>   ;;
> start)conntrackd_start;;
> stop) conntrackd_stop;;
> monitor)  conntrackd_monitor;;
> migrate_to)   ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to 
> ${OCF_RESKEY_CRM_meta_migrate_to}."
>   conntrackd_stop
>   ;;
> migrate_from) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to 
> ${OCF_RESKEY_CRM_meta_migrated_from}."
>   conntrackd_start

Migrate actions are really not needed here.

>   ;;
> reload)   ocf_log err "Reloading..."
>   conntrackd_start
>   ;;
> validate-all) conntrackd_validate;;

You might want to invoke conntrackd_validate before start, or even
before any action except metadata, help, and usage.

> usage|help)   conntrackd_usage
>   exit $OCF_SUCCESS
>   ;;
> *)conntrackd_usage
>