Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Nah just different decisions of design, either way as mentioned given that your RA covers the whole daemon better I think it makes sense to move ahead on it, I'll give it a spin next week. On Wed, Oct 27, 2010 at 3:35 PM, Dominik Klein wrote: >> Maybe I just don't understand it, but isn't that broken? > > Jon and I just talked on IRC and my misunderstanding was that he > intended to _start_ conntrackd via an init script and have the cluster > only do the cache commiting/flushing and monitoring work. > > So from a cluster configuration point of view, conntrackd would only run > on one node at a time, no clone resources at all. > > I want to say "my bad" but I'm not too sure I was all wrong here :) > > Dominik > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
> Maybe I just don't understand it, but isn't that broken? Jon and I just talked on IRC and my misunderstanding was that he intended to _start_ conntrackd via an init script and have the cluster only do the cache commiting/flushing and monitoring work. So from a cluster configuration point of view, conntrackd would only run on one node at a time, no clone resources at all. I want to say "my bad" but I'm not too sure I was all wrong here :) Dominik ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
> can you please consult with Dominik Klein, who (unaware of your effort) > incidentally also wrote an RA for conntrackd and posted it on this list > a few days ago? It would be nice if the two of you could consolidate > your efforts and come up with an updated patch. Maybe you guys could get > together in #linux-ha on freenode one of these days? I rather strongly think that the RA posted here is broken. Michael and Jonathan suggested to just clone the resource and have it run on multiple nodes at the same time. This is certainly necessary because otherwise connection states would not be synchronized. But with that RA: If you just commit and flush external caches during _start_, how is the remaining node supposed to keep connections running that were established through the failed machine? Since it is started already, it will not commit its caches. Maybe I just don't understand it, but isn't that broken? Also the failback story, which would not work with that RA: Am 10/15/2010 01:43 PM, schrieb Dominik Klein: > The main challenge in this RA was the failback part. Say one system goes > down completely. Then it loses the kernel connection tracking table and > the external cache. Once it comes back, it will receive updates for new > connections that are initiated through the master, but it will neither > be sent the complete tracking table of the current master, nor can it > request this (that's how I understand and tested conntrackd works, > please correct me if I'm wrong :)). > > This may be acceptable for short-lived connections and configurations > where there is no preferred master system, but it does become a problem > if you have either of those. > > So my approach is to send a so called "bulk update" in two situations: > > a) in the notify pre promote call, if the local machine is not the > machine to be promoted > This part is responsible for sending the update to a preferred master > that had previously failed (failback). > b) in the notify post start call, if the local machine is the master > This part is responsible for sending the update to a previously failed > machine that re-joins the cluster but is not to be promoted right away. Regards Dominik ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Jonathan, can you please consult with Dominik Klein, who (unaware of your effort) incidentally also wrote an RA for conntrackd and posted it on this list a few days ago? It would be nice if the two of you could consolidate your efforts and come up with an updated patch. Maybe you guys could get together in #linux-ha on freenode one of these days? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Hi again, Just a quick question for a pointer, I've been troubleshooting the part about having the RA return OCF_NOT_RUNNING when the resource is stopped, however what parameter should I verify to check whether it's stopped or not? Thanks On Tue, Oct 12, 2010 at 3:45 PM, Jonathan Petersson wrote: > Unfortunately not yet, I've been busy with some other projects but > I'll try to put some time into it later this week. > > On Tue, Oct 12, 2010 at 11:58 AM, Florian Haas > wrote: >> Jonathan, is there an updated revision of this RA available? Have you >> progressed in making it a master/slave capable RA? >> >> Cheers, >> Florian >> >> >> ___ >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ >> >> > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Unfortunately not yet, I've been busy with some other projects but I'll try to put some time into it later this week. On Tue, Oct 12, 2010 at 11:58 AM, Florian Haas wrote: > Jonathan, is there an updated revision of this RA available? Have you > progressed in making it a master/slave capable RA? > > Cheers, > Florian > > > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Jonathan, is there an updated revision of this RA available? Have you progressed in making it a master/slave capable RA? Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Hello Jonathan, a few more comments below. >> - conntrackd_start doesn't produce the correct error code if the binary >> isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}". > > Added to conntrackd_validate The way you did it, that change will break probes. You'll probably want to use "ocf_is_probe || check_binary ${OCF_RESKEY_binary}" instead. >> - I don't understand how the conntrack daemon is actually started. >> Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start? > > The OCF resource doesn't really handle whether the daemon should start > or not (see earlier discussions in this on the thread), I would > probably add the daemon as a LSB resource rather. No, you missed the point there. It's perfectly fine (and indeed, expected) of the RA to start and stop the daemon process -- it's just that you should do that in the start and stop options, not in monitor like you originally did. >> - Please run your RA through ocf-tester. > > deb-fw1:~# ocf-tester -L -n conntrackd > /usr/lib/ocf/resource.d/heartbeat/conntrackd > Using lrmd/lrmadmin for all tests > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd... > * rc=0: Monitoring a stopped resource should return 7 > * Your agent does not support the notify action (optional) > * Your agent does not support the demote action (optional) > * Your agent does not support the promote action (optional) > * Your agent does not support master/slave (optional) > * rc=0: Monitoring a stopped resource should return 7 > * rc=0: Monitoring a stopped resource should return 7 > * rc=0: Monitoring a stopped resource should return 7 > Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests > > Not 100% sure codewise what I should change. First thing would be to fix your monitor action so that when the resource is stopped, it properly returns $OCF_NOT_RUNNING. > Maybe it would make sense to add master/slave to handle the > state-transition and have start/stop handle whether the daemon is > alive or not. However I'm not entirely sure how the crm configuration > statement for this should look given that the daemon should be started > on both nodes running in either master or slave mode, do you have an > example for this? Rather simple: primitive p_conntrackd ocf:heartbeat:Conntrackd \ op monitor interval=30s ms ms_conntrackd p_conntrackd Hope this helps, Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Great, I'll give it a try thanks for the input On Thu, Sep 30, 2010 at 1:09 PM, Cnut Jansen wrote: > Am 30.09.2010 09:49, schrieb Jonathan Petersson: >> deb-fw1:~# ocf-tester -L -n conntrackd >> /usr/lib/ocf/resource.d/heartbeat/conntrackd >> Using lrmd/lrmadmin for all tests >> Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd... >> * rc=0: Monitoring a stopped resource should return 7 >> * Your agent does not support the notify action (optional) >> * Your agent does not support the demote action (optional) >> * Your agent does not support the promote action (optional) >> * Your agent does not support master/slave (optional) >> * rc=0: Monitoring a stopped resource should return 7 >> * rc=0: Monitoring a stopped resource should return 7 >> * rc=0: Monitoring a stopped resource should return 7 >> Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests >> >> Not 100% sure codewise what I should change. >> >> Maybe it would make sense to add master/slave to handle the >> state-transition and have start/stop handle whether the daemon is >> alive or not. However I'm not entirely sure how the crm configuration >> statement for this should look given that the daemon should be started >> on both nodes running in either master or slave mode, do you have an >> example for this? > > Hi, > > without having had a look onto your RA and its necessary parameters etc. > at all yet: > I had similiar issues with my Tomcat6- and Apache-RAs, though they > worked without problems in the cluster. When I had the time and ambition > to finally also make them ocf-tester-compliant (meanwhile they are > anyway, without special additional handlers for the ocf-tester), I could > track problems down to various timing issues. Not only that back at that > time I still used start-delay for the monitor-op in the cluster (because > back then I didn't check and wait for i.e. Tomcat being operational in > start-action respectively not anymore and all files being closed in > stop-action yet), but also exspecially that ocf-tester didn't set i.e. a > $OCF_RESKEY_CRM_meta_timeout, on which my RA back then still depended on > for its monitor-action (now it's not anymore, but back then I was still > eager about allways cleanly exiting with at least an OCF_ERR_* too, > instead of just simply letting it time out and having Pacemaker handle > that situation (-; ). > Also - since I don't know about your RA's parameters, but at least for > mine it helped, due to some not-up-to-date-anymore-defaults - it might > be necessary to tell ocf-tester to pass over some OCF-parameters, to > have them available as $OCF_RESKEY_*. You do that with the "-o"-option > to ocf-tester, like > -o log_level="debug" > to have ocf-tester set an env.-var. $OCF_RESKEY_log_level with the value > "debug". > > This was my complete command for calling ocf-tester back then, long time > ago (i.e. still includes my switch for special ocf-tester-handler, > etc.): (-; > ocf-tester -n testTom -o log_debug="/tmp/debug-tomcat.log" -o > monitor_timeoutPerTry="5" -o monitor_precheckDB="1" -o log_level="debug" > -o stopith_enabled="1" -o > monitor_url="http://localhost:8080/opencms/opencms/test/cluster.html"; -o > agent_timebuffer="4000" -o ocftester="y" /usr/lib/ocf/resource.d/cj/tomcat > > Hope that helps you a little on making ocf-tester accept your RA too. (-; > > Cnut Jansen > > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Am 30.09.2010 09:49, schrieb Jonathan Petersson: > deb-fw1:~# ocf-tester -L -n conntrackd > /usr/lib/ocf/resource.d/heartbeat/conntrackd > Using lrmd/lrmadmin for all tests > Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd... > * rc=0: Monitoring a stopped resource should return 7 > * Your agent does not support the notify action (optional) > * Your agent does not support the demote action (optional) > * Your agent does not support the promote action (optional) > * Your agent does not support master/slave (optional) > * rc=0: Monitoring a stopped resource should return 7 > * rc=0: Monitoring a stopped resource should return 7 > * rc=0: Monitoring a stopped resource should return 7 > Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests > > Not 100% sure codewise what I should change. > > Maybe it would make sense to add master/slave to handle the > state-transition and have start/stop handle whether the daemon is > alive or not. However I'm not entirely sure how the crm configuration > statement for this should look given that the daemon should be started > on both nodes running in either master or slave mode, do you have an > example for this? Hi, without having had a look onto your RA and its necessary parameters etc. at all yet: I had similiar issues with my Tomcat6- and Apache-RAs, though they worked without problems in the cluster. When I had the time and ambition to finally also make them ocf-tester-compliant (meanwhile they are anyway, without special additional handlers for the ocf-tester), I could track problems down to various timing issues. Not only that back at that time I still used start-delay for the monitor-op in the cluster (because back then I didn't check and wait for i.e. Tomcat being operational in start-action respectively not anymore and all files being closed in stop-action yet), but also exspecially that ocf-tester didn't set i.e. a $OCF_RESKEY_CRM_meta_timeout, on which my RA back then still depended on for its monitor-action (now it's not anymore, but back then I was still eager about allways cleanly exiting with at least an OCF_ERR_* too, instead of just simply letting it time out and having Pacemaker handle that situation (-; ). Also - since I don't know about your RA's parameters, but at least for mine it helped, due to some not-up-to-date-anymore-defaults - it might be necessary to tell ocf-tester to pass over some OCF-parameters, to have them available as $OCF_RESKEY_*. You do that with the "-o"-option to ocf-tester, like -o log_level="debug" to have ocf-tester set an env.-var. $OCF_RESKEY_log_level with the value "debug". This was my complete command for calling ocf-tester back then, long time ago (i.e. still includes my switch for special ocf-tester-handler, etc.): (-; ocf-tester -n testTom -o log_debug="/tmp/debug-tomcat.log" -o monitor_timeoutPerTry="5" -o monitor_precheckDB="1" -o log_level="debug" -o stopith_enabled="1" -o monitor_url="http://localhost:8080/opencms/opencms/test/cluster.html"; -o agent_timebuffer="4000" -o ocftester="y" /usr/lib/ocf/resource.d/cj/tomcat Hope that helps you a little on making ocf-tester accept your RA too. (-; Cnut Jansen ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Updated: http://pastebin.com/VC03VXvi See answers inline On Thu, Sep 30, 2010 at 9:18 AM, Florian Haas wrote: > Hello Jonathan, > > A few other comments: > > - As stated before, numeric comparisons ought to be numeric comparisons: > >> if [ $? = $OCF_SUCCESS ]; then > > Use -eq. > > - You still have a bunch like these around: > >> # shorten kernel conntrack timers to remove the zombie entries. >> $CONNTRACKD -t >> if [ $? -eq 1 ] >> then >> return $OCF_ERR_GENERIC >> fi Missed that one. > > Use ocf_run. > > - conntrackd_start doesn't produce the correct error code if the binary > isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}". Added to conntrackd_validate > > - What's the "asd" doing here? > >> # flush the internal and the external caches >> ocf_run $CONNTRACKD -f || exit "asd".$OCF_ERR_GENERIC > Typo from debug, fixed. > - I don't understand how the conntrack daemon is actually started. > Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start? The OCF resource doesn't really handle whether the daemon should start or not (see earlier discussions in this on the thread), I would probably add the daemon as a LSB resource rather. > > - You're still advertising migrate_from and migrate_to through RA > metadata and the usage message, please remove those. Fixed. > > - All your parameter definitions come with "contect" in place of > "content", please fix that typo. Fixed > > - Please run your RA through ocf-tester. deb-fw1:~# ocf-tester -L -n conntrackd /usr/lib/ocf/resource.d/heartbeat/conntrackd Using lrmd/lrmadmin for all tests Beginning tests for /usr/lib/ocf/resource.d/heartbeat/conntrackd... * rc=0: Monitoring a stopped resource should return 7 * Your agent does not support the notify action (optional) * Your agent does not support the demote action (optional) * Your agent does not support the promote action (optional) * Your agent does not support master/slave (optional) * rc=0: Monitoring a stopped resource should return 7 * rc=0: Monitoring a stopped resource should return 7 * rc=0: Monitoring a stopped resource should return 7 Tests failed: /usr/lib/ocf/resource.d/heartbeat/conntrackd failed 4 tests Not 100% sure codewise what I should change. Maybe it would make sense to add master/slave to handle the state-transition and have start/stop handle whether the daemon is alive or not. However I'm not entirely sure how the crm configuration statement for this should look given that the daemon should be started on both nodes running in either master or slave mode, do you have an example for this? > > Cheers, > Florian > > > > ___ > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > > ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Hello Jonathan, A few other comments: - As stated before, numeric comparisons ought to be numeric comparisons: > if [ $? = $OCF_SUCCESS ]; then Use -eq. - You still have a bunch like these around: > # shorten kernel conntrack timers to remove the zombie entries. > $CONNTRACKD -t > if [ $? -eq 1 ] > then > return $OCF_ERR_GENERIC > fi Use ocf_run. - conntrackd_start doesn't produce the correct error code if the binary isn't installed. Easily fixed with "check_binary ${OCF_RESKEY_binary}". - What's the "asd" doing here? > # flush the internal and the external caches > ocf_run $CONNTRACKD -f || exit "asd".$OCF_ERR_GENERIC - I don't understand how the conntrack daemon is actually started. Shouldn't there be some invocation of $CONNTRACKD -d in conntrackd_start? - You're still advertising migrate_from and migrate_to through RA metadata and the usage message, please remove those. - All your parameter definitions come with "contect" in place of "content", please fix that typo. - Please run your RA through ocf-tester. Cheers, Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
I think I've covered most of the input I've gotten so far, in lack of git access internally here's another pastebin. Please let me know if there's any additional changes that should be made. http://pastebin.com/QMuyTRWB On Tue, Sep 28, 2010 at 10:28 PM, Jonathan Petersson wrote: > Great feedback, thanks a bunch, I'll get to it making the mods recommended > asap. > > On Tue, Sep 28, 2010 at 10:23 PM, Florian Haas > wrote: >> Hi Jonathan, >> >> as this discussion applies to a new RA, let's move it to the -dev list. >> Very helpful contribution, thanks a lot. A few comments below. >> >>> # Fill in some defaults if no values are specified >>> OCF_RESKEY_bin_default="/usr/sbin/conntrackd" >>> OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf" >>> OCF_RESKEY_lck_default="/var/lock/conntrack.lock" >> >> Please use self-explanatory parameter names wherever possible; makes >> life a whole lot easier for your users. I'd suggest "binary", "config", >> and "lockfile" instead of "bin", "cfg", and "lck". >> >>> CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}" >> >> This assignment should come _after_ you initialize those variables with >> their defaults. >> >>> : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}} >>> : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}} >>> : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}} >> >> >>> meta_data() { >>> cat <> >> This is a pet peeve, but would you mind using EOF for the here document >> marker to make emacs users happy? :) >> >>> >>> >>> >>> 1.0 >> >> Really minor issue here, but the version numbers should agree. >> >>> >>> >>> This is a Conntrackd resource to manage primary and secondary state between >>> two firewalls in a cluster. >>> >>> Manages primary/backup conntrackd state >>> >>> >>> >> >> Nope. That one's definitely not unique. >> >>> >>> Location of conntrackd binary. >>> >>> Conntrackd bin >>> >> >> You have a variable defined for the default; it's wise to use it here. >> >>> >>> >>> >> >> That one isn't unique either, I think. >> >>> >>> Location of conntrackd configuration file. >>> >>> Conntrackd config >>> >> >> As above. >> >>> >>> >>> >> >> This one _might_ be unique, though I really don't think so. >> >>> >>> Location of conntrackd lock-file. >>> >>> Conntrackd lock-file >>> >> >> Again, as above. >> >>> # Call monitor to verify that conntrackd is running >>> conntrackd_monitor >>> >>> if [ $? = $OCF_SUCCESS ]; then >> >> Use -eq instead of = for numeric comparisons. >> >>> >>> # commit the external cache into the kernel table >>> $CONNTRACKD -c >>> if [ $? -eq 1 ] >>> then >>> return $OCF_ERR_GENERIC >> >> As a general rule, there is hardly ever any need to _return_ error >> codes, you could just exit on them. And, whenever you error out, do so >> after logging an error message with ocf_log err. In your case, it's >> likely that you want to run the conntrackd command, capture its output, >> and log it on error. All of which is easily achieved with ocf_run, like so: >> >> ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC >> >> You can probably apply that to most of your invocations of $CONNTRACKD. >> >>> conntrackd_monitor() { >>> >>> # Define conntrackd_pid variable >>> local conntrackd_pid=`pidof ${OCF_RESKEY_bin}` >> >> Bashism. If you decĺare the resource agent as bourne shell compatible >> (#!/bin/sh), this should be: >> >> local conntrackd_pid >> conntrackd_pid=`pidof ${OCF_RESKEY_bin}` >> >> ... or you force bash with #!/bin/bash. >> >>> # Check for conntrackd lock-file >>> if [ -f $OCF_RESKEY_lck ]; then >>> >>> # Check for conntrackd pid >>> if [ $conntrackd_pid ]; then >> >> I think this is another bashism, Bourne shell would expect [ -n >> "$conntrackd_pid" ]. >> >>> # Successfull if lock and pid exists >>> return $OCF_SUCCESS >>> else >>> # Error if pid exists but pid isn't running >>> return $OCF_ERR_GENERIC >>> fi >>> else >>> # False if lock and pid missing >>> $OCF_NOT_RUNNING >>> >>> # Start conntrackd daemon >>> $CONNTRACKD -d >> >> No. It's not the monitor action's job to restart anything. If monitor >> exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that >> resource. And it happens to be good at that job. :) >> >>> conntrackd_validate() { >>> >>> # Check if conntrackd binary exists >>> check_binary ${OCF_RESKEY_bin} >>> >>> if [ $? != 0 ]; then >>> return $OCF_ERR_ARGS >>> fi >> >> Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which >> is the correct error code here) if the binary isn't found. >> >>> >>> # Check if conntrackd config exists >>> if [ ! -f ${OCF_RESKEY_cfg} ]; then >>> return $OCF_ERR_ARGS >> >> This should be $OCF_ERR_INSTALLED. >> >>> fi >>> >>> return $OCF_SUCCESS >>> } >>> >>> case $__OCF_ACTION in >>> meta-data) meta_data >>> exit $OCF_SUCCESS >>>
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Great feedback, thanks a bunch, I'll get to it making the mods recommended asap. On Tue, Sep 28, 2010 at 10:23 PM, Florian Haas wrote: > Hi Jonathan, > > as this discussion applies to a new RA, let's move it to the -dev list. > Very helpful contribution, thanks a lot. A few comments below. > >> # Fill in some defaults if no values are specified >> OCF_RESKEY_bin_default="/usr/sbin/conntrackd" >> OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf" >> OCF_RESKEY_lck_default="/var/lock/conntrack.lock" > > Please use self-explanatory parameter names wherever possible; makes > life a whole lot easier for your users. I'd suggest "binary", "config", > and "lockfile" instead of "bin", "cfg", and "lck". > >> CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}" > > This assignment should come _after_ you initialize those variables with > their defaults. > >> : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}} >> : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}} >> : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}} > > >> meta_data() { >> cat < > This is a pet peeve, but would you mind using EOF for the here document > marker to make emacs users happy? :) > >> >> >> >> 1.0 > > Really minor issue here, but the version numbers should agree. > >> >> >> This is a Conntrackd resource to manage primary and secondary state between >> two firewalls in a cluster. >> >> Manages primary/backup conntrackd state >> >> >> > > Nope. That one's definitely not unique. > >> >> Location of conntrackd binary. >> >> Conntrackd bin >> > > You have a variable defined for the default; it's wise to use it here. > >> >> >> > > That one isn't unique either, I think. > >> >> Location of conntrackd configuration file. >> >> Conntrackd config >> > > As above. > >> >> >> > > This one _might_ be unique, though I really don't think so. > >> >> Location of conntrackd lock-file. >> >> Conntrackd lock-file >> > > Again, as above. > >> # Call monitor to verify that conntrackd is running >> conntrackd_monitor >> >> if [ $? = $OCF_SUCCESS ]; then > > Use -eq instead of = for numeric comparisons. > >> >> # commit the external cache into the kernel table >> $CONNTRACKD -c >> if [ $? -eq 1 ] >> then >> return $OCF_ERR_GENERIC > > As a general rule, there is hardly ever any need to _return_ error > codes, you could just exit on them. And, whenever you error out, do so > after logging an error message with ocf_log err. In your case, it's > likely that you want to run the conntrackd command, capture its output, > and log it on error. All of which is easily achieved with ocf_run, like so: > > ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC > > You can probably apply that to most of your invocations of $CONNTRACKD. > >> conntrackd_monitor() { >> >> # Define conntrackd_pid variable >> local conntrackd_pid=`pidof ${OCF_RESKEY_bin}` > > Bashism. If you decĺare the resource agent as bourne shell compatible > (#!/bin/sh), this should be: > > local conntrackd_pid > conntrackd_pid=`pidof ${OCF_RESKEY_bin}` > > ... or you force bash with #!/bin/bash. > >> # Check for conntrackd lock-file >> if [ -f $OCF_RESKEY_lck ]; then >> >> # Check for conntrackd pid >> if [ $conntrackd_pid ]; then > > I think this is another bashism, Bourne shell would expect [ -n > "$conntrackd_pid" ]. > >> # Successfull if lock and pid exists >> return $OCF_SUCCESS >> else >> # Error if pid exists but pid isn't running >> return $OCF_ERR_GENERIC >> fi >> else >> # False if lock and pid missing >> $OCF_NOT_RUNNING >> >> # Start conntrackd daemon >> $CONNTRACKD -d > > No. It's not the monitor action's job to restart anything. If monitor > exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that > resource. And it happens to be good at that job. :) > >> conntrackd_validate() { >> >> # Check if conntrackd binary exists >> check_binary ${OCF_RESKEY_bin} >> >> if [ $? != 0 ]; then >> return $OCF_ERR_ARGS >> fi > > Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which > is the correct error code here) if the binary isn't found. > >> >> # Check if conntrackd config exists >> if [ ! -f ${OCF_RESKEY_cfg} ]; then >> return $OCF_ERR_ARGS > > This should be $OCF_ERR_INSTALLED. > >> fi >> >> return $OCF_SUCCESS >> } >> >> case $__OCF_ACTION in >> meta-data) meta_data >> exit $OCF_SUCCESS >> ;; >> start) conntrackd_start;; >> stop) conntrackd_stop;; >> monitor) conntrackd_monitor;; >> migrate_to) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to >> ${OCF_RESKEY_CRM_meta_migrate_to}." >> conntrackd_stop >> ;; >> migrate_from) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to >> ${OCF_RESKEY_CRM_meta_migrated_from}." >> conntrackd_start > > Migrate actions are really
Re: [Linux-ha-dev] [Linux-HA] OCF resource for conntrack
Hi Jonathan, as this discussion applies to a new RA, let's move it to the -dev list. Very helpful contribution, thanks a lot. A few comments below. > # Fill in some defaults if no values are specified > OCF_RESKEY_bin_default="/usr/sbin/conntrackd" > OCF_RESKEY_cfg_default="/etc/conntrackd/conntrackd.conf" > OCF_RESKEY_lck_default="/var/lock/conntrack.lock" Please use self-explanatory parameter names wherever possible; makes life a whole lot easier for your users. I'd suggest "binary", "config", and "lockfile" instead of "bin", "cfg", and "lck". > CONNTRACKD="${OCF_RESKEY_bin} -C ${OCF_RESKEY_cfg}" This assignment should come _after_ you initialize those variables with their defaults. > : ${OCF_RESKEY_bin=${OCF_RESKEY_bin_default}} > : ${OCF_RESKEY_cfg=${OCF_RESKEY_cfg_default}} > : ${OCF_RESKEY_lck=${OCF_RESKEY_lck_default}} > meta_data() { > cat < > > > 1.0 Really minor issue here, but the version numbers should agree. > > > This is a Conntrackd resource to manage primary and secondary state between > two firewalls in a cluster. > > Manages primary/backup conntrackd state > > > Nope. That one's definitely not unique. > > Location of conntrackd binary. > > Conntrackd bin > You have a variable defined for the default; it's wise to use it here. > > > That one isn't unique either, I think. > > Location of conntrackd configuration file. > > Conntrackd config > As above. > > > This one _might_ be unique, though I really don't think so. > > Location of conntrackd lock-file. > > Conntrackd lock-file > Again, as above. > # Call monitor to verify that conntrackd is running > conntrackd_monitor > > if [ $? = $OCF_SUCCESS ]; then Use -eq instead of = for numeric comparisons. > > # commit the external cache into the kernel table > $CONNTRACKD -c > if [ $? -eq 1 ] > then > return $OCF_ERR_GENERIC As a general rule, there is hardly ever any need to _return_ error codes, you could just exit on them. And, whenever you error out, do so after logging an error message with ocf_log err. In your case, it's likely that you want to run the conntrackd command, capture its output, and log it on error. All of which is easily achieved with ocf_run, like so: ocf_run $CONNTRACKD -c || exit $OCF_ERR_GENERIC You can probably apply that to most of your invocations of $CONNTRACKD. > conntrackd_monitor() { > > # Define conntrackd_pid variable > local conntrackd_pid=`pidof ${OCF_RESKEY_bin}` Bashism. If you decĺare the resource agent as bourne shell compatible (#!/bin/sh), this should be: local conntrackd_pid conntrackd_pid=`pidof ${OCF_RESKEY_bin}` ... or you force bash with #!/bin/bash. > # Check for conntrackd lock-file > if [ -f $OCF_RESKEY_lck ]; then > > # Check for conntrackd pid > if [ $conntrackd_pid ]; then I think this is another bashism, Bourne shell would expect [ -n "$conntrackd_pid" ]. > # Successfull if lock and pid exists > return $OCF_SUCCESS > else > # Error if pid exists but pid isn't running > return $OCF_ERR_GENERIC > fi > else > # False if lock and pid missing > $OCF_NOT_RUNNING > > # Start conntrackd daemon > $CONNTRACKD -d No. It's not the monitor action's job to restart anything. If monitor exits with $OCF_NOT_RUNNING, it's Pacemaker's job to recover that resource. And it happens to be good at that job. :) > conntrackd_validate() { > > # Check if conntrackd binary exists > check_binary ${OCF_RESKEY_bin} > > if [ $? != 0 ]; then > return $OCF_ERR_ARGS > fi Not necessary, check_binary already exits with $OCF_ERR_INSTALLED (which is the correct error code here) if the binary isn't found. > > # Check if conntrackd config exists > if [ ! -f ${OCF_RESKEY_cfg} ]; then > return $OCF_ERR_ARGS This should be $OCF_ERR_INSTALLED. > fi > > return $OCF_SUCCESS > } > > case $__OCF_ACTION in > meta-data)meta_data > exit $OCF_SUCCESS > ;; > start)conntrackd_start;; > stop) conntrackd_stop;; > monitor) conntrackd_monitor;; > migrate_to) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to > ${OCF_RESKEY_CRM_meta_migrate_to}." > conntrackd_stop > ;; > migrate_from) ocf_log info "Migrating ${OCF_RESOURCE_INSTANCE} to > ${OCF_RESKEY_CRM_meta_migrated_from}." > conntrackd_start Migrate actions are really not needed here. > ;; > reload) ocf_log err "Reloading..." > conntrackd_start > ;; > validate-all) conntrackd_validate;; You might want to invoke conntrackd_validate before start, or even before any action except metadata, help, and usage. > usage|help) conntrackd_usage > exit $OCF_SUCCESS > ;; > *)conntrackd_usage >