Re: [Linux-HA] Are the Resource Agents POSIX compliant?
Hi Florian, i've looked for bashism into the debian package (1.0.3-3.1) and this is the result: $ for i in * .*; do checkbashisms $i 21 |grep -v is already a bash script; skipping ; done possible bashism in AudibleAlarm line 84 (echo -e): echo -ne \a /dev/console possible bashism in CTDB line 333 (should be 'b = a'): [ $OCF_RESKEY_ctdb_logfile == syslog ] log_option=--syslog possible bashism in CTDB line 359 (brace expansion, should be $(seq a b)): for i in {1..30}; do possible bashism in Delay line 160 ([^] should be [!]): echo $i |grep -v [^0-9.] |grep -q -v [.].*[.] script IPv6addr does not appear to have a #! interpreter line; you may get strange results possible bashism in SAPDatabase line 540 (should be word 21): eval $VALUE /dev/null possible bashism in SAPInstance line 383 (should be word 21): eval $VALUE /dev/null possible bashism in anything line 129 (let ...): let i++ possible bashism in eDir88 line 190 (let ...): let CNT=$CNT+1 possible bashism in eDir88 line 322 (declare): declare rc=$OCF_SUCCESS possible bashism in oracle line 234 (should be 'b = a'): if [ x == x$ORACLE_HOME ]; then possible bashism in oracle line 238 (should be 'b = a'): if [ x == x$ORACLE_OWNER ]; then possible bashism in oracle line 386 (should be 'b = a'): if [ x$dumpdest == x -o ! -d $dumpdest ]; then possible bashism in oracle line 390 (local -opt): local -i fcount=`ls -rt $dumpdest | wc -l` possible bashism in oracle line 393 (local -opt): local -i fcount2=`ls -rt $dumpdest | wc -l` possible bashism in oralsnr line 161 (should be 'b = a'): if [ x == x$ORACLE_HOME ]; then possible bashism in oralsnr line 165 (should be 'b = a'): if [ x == x$ORACLE_OWNER ]; then script .ocf-binaries does not appear to have a #! interpreter line; you may get strange results script .ocf-directories does not appear to have a #! interpreter line; you may get strange results script .ocf-returncodes does not appear to have a #! interpreter line; you may get strange results script .ocf-shellfuncs does not appear to have a #! interpreter line; you may get strange results possible bashism in .ocf-shellfuncs line 68 ($RANDOM): local rnd=$RANDOM It seems that there are bunch of agents (AudibleAlarm, CTDB, Delay, IPv6addr, SAPDatabase, SAPInstance, anything, eDir88, oracle, oralsnr) that contain bashisms and that have the #!/bin/sh interpreter line. but what concern me most is the .ocf-shellfuncs that is called by almost all agents (POSIX or not) contain a well known bashism ($RANDOM). Il giorno mar, 18/01/2011 alle 11.43 +0100, Florian Haas ha scritto: All the agents that declare their interpreter to be /bin/sh have had any bashisms eradicated for the Debian squeeze release. So yes these will work on a system where /bin/sh links to dash. Florian -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Are the Resource Agents POSIX compliant?
Hello, I'm in the process to upgrade from Debian lenny to squeeze (so from heartbeat 2.1.3 to pacemaker 1.0.9) but from this release the default shell (only for scripts) is changed from bash to dash. The difference from bash to dash i that the second one is strictly POSIX compliant and doesn't support bashisms: https://wiki.ubuntu.com/DashAsBinSh So my question is: resource agents (now cluster agents) are POSIX compliant? -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Resource colocation with a clone.
No the from/to attributes was correctly instantiated. The error was the id used to constrain the resource to the instances of the clone. You must constrain the id of the clone not the id of the instances of the clone. Example: If the definition of the clone is: clone id=clone_service ... primitive id=service ... / ... /primitive /clone The constrain must be: rsc_colocation id=ip_runs_with_service from=ip to=clone_service score=INFINITY and not: rsc_colocation id=ip_runs_with_service from=ip to=service score=INFINITY PS: I run the heartbeat 2.1.3 from Debian Lenny. Il giorno 17/mar/2010, alle ore 22.08, Andrew Beekhof ha scritto: Probably swap the values of from and to. Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] New HOWTO about High Available Firewalls
Il giorno mer, 04/02/2009 alle 16.16 +0100, Michael Schwartzkopff ha scritto: Am Mittwoch, 4. Februar 2009 12:27:44 schrieb Igor Neves: Hi, I have done some work woth conntrackd and heartbeat a couple of time ago. Attached it's one conntrackd OCF script I made but when I finish I realized that it was not working and would never work. As you say in your HOWTO, conntrackd work with 2 caches. I do start conntracd outside of heartbeat from init. So setup of sync is already working before the cluster starts. Inside heartbeat I only dump the connection table from the cache into the kernel (firewall starts) or clear the cache (firewall stops) I've also a 2-node active-standby firewall setup in production. The problem with conntrackd is that it has only one sync connection with the other node. To solve this SPOF I wrote two RA. - the first one starts conntrackd and checks (in the monitor action) if the other node is alive, otherwise, restarts conntrackd with another configuration with another communication media. - the second simply commits the conntrack tables from the other node when it starts. Obviously you must co-locate the second resource to an IP resource (or in my case another custom RA that bridges some interfaces). The two RA are still in a work-for-me status but they proved stable for a while. Maybe in the next days I'll post them here to gather some comments. If you want to write a OCF resource for that task to be done inside heartbeat you need a stateful agent. You agent below is not stateful, i.e. it does not unterstand promote and demote. Re-thinking: Perhaps you also could state a conntrackd clone... In my implementation a clone (one for every node) of the table merging RA is enough. -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Meta data syntax
Heartbeat 2.1.3 on a debian etch taken from debian backports. Il giorno gio, 15/01/2009 alle 10.00 +0100, Dominik Klein ha scritto: Which version are you using? That's a known and fixed bug from a rather old version. Unfortunately, the bugzilla is not available at the moment. But searching for bugs with keyword meta once it is back should get you to the changeset. Regards Dominik Michele Codutti wrote: Hello, I'm working on an RA and I have some problems with the meta-data part of my RA. The script is working correctly and I've tested it with ocf-tester. To do a field test I've setup a two node cluster and installed into it. The configuration is done by the graphical GUI (hbclient). The problem is evident with the GUI when you create a new resource, and select my RA, then no parameter is shown you by the GUI. If I fill manually the parameters fields the resource starts and work as expected (very well :) ). The error that i see on the log is: mgmtd: [6873]: ERROR: lrm_get_rsc_type_metadata(572): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg. I've checked the XML almost 20 times but I found it syntactically correct. I post it here in hoping that someone give me a hint about how to fix this situation: meta_data() { cat EOF ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=LSBwrapper version=0.1 version1.0/version longdesc lang=en This is a wapper Resource Agent built around a 3.1 LSB init script. You can also override the defined functions with an override script. /longdesc shortdesc lang=enLSB wrapper resource agent/shortdesc parameters parameter name=InitScript unique=1 longdesc lang=en Location of the init script. /longdesc shortdesc lang=enInit script location/shortdesc content type=string / /parameter parameter name=OverrideScript unique=1 longdesc lang=en Location of the override script. /longdesc shortdesc lang=enOverride script location/shortdesc content type=string / /parameter /parameters actions action name=starttimeout=90 / action name=stop timeout=100 / action name=monitor timeout=20 interval=10 depth=0 start-delay=0 / action name=reload timeout=90 / action name=meta-datatimeout=5 / action name=verify-all timeout=30 / /actions /resource-agent EOF } The RA is a generic wrapper to an 3.1 LSB init script. The main function calls meta-data (if requested) and then exit with an $OCF_SUCCESS. Any suggestion is really appreciated. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Meta data syntax
Hello, I'm working on an RA and I have some problems with the meta-data part of my RA. The script is working correctly and I've tested it with ocf-tester. To do a field test I've setup a two node cluster and installed into it. The configuration is done by the graphical GUI (hbclient). The problem is evident with the GUI when you create a new resource, and select my RA, then no parameter is shown you by the GUI. If I fill manually the parameters fields the resource starts and work as expected (very well :) ). The error that i see on the log is: mgmtd: [6873]: ERROR: lrm_get_rsc_type_metadata(572): got a return code HA_FAIL from a reply message of rmetadata with function get_ret_from_msg. I've checked the XML almost 20 times but I found it syntactically correct. I post it here in hoping that someone give me a hint about how to fix this situation: meta_data() { cat EOF ?xml version=1.0? !DOCTYPE resource-agent SYSTEM ra-api-1.dtd resource-agent name=LSBwrapper version=0.1 version1.0/version longdesc lang=en This is a wapper Resource Agent built around a 3.1 LSB init script. You can also override the defined functions with an override script. /longdesc shortdesc lang=enLSB wrapper resource agent/shortdesc parameters parameter name=InitScript unique=1 longdesc lang=en Location of the init script. /longdesc shortdesc lang=enInit script location/shortdesc content type=string / /parameter parameter name=OverrideScript unique=1 longdesc lang=en Location of the override script. /longdesc shortdesc lang=enOverride script location/shortdesc content type=string / /parameter /parameters actions action name=starttimeout=90 / action name=stop timeout=100 / action name=monitor timeout=20 interval=10 depth=0 start-delay=0 / action name=reload timeout=90 / action name=meta-datatimeout=5 / action name=verify-all timeout=30 / /actions /resource-agent EOF } The RA is a generic wrapper to an 3.1 LSB init script. The main function calls meta-data (if requested) and then exit with an $OCF_SUCCESS. Any suggestion is really appreciated. -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Two Apaches with two IP in a active active configuration
Hello, maybe i was not clear about why i wrote here: i don't want to notify a bug, i only want to have some suggestions to resolve my configuration problems. The version of heartbeat is only a reference for the readers to know what possible features are available. Maybe someone use the same version of heartbeat and may be someone had resolved the same problem. If this person is kind enough to write his thoughts about my question i will be grateful to him. Il giorno mar, 02/12/2008 alle 16.50 +0100, Andrew Beekhof ha scritto: On Tue, Dec 2, 2008 at 15:24, Michele Codutti [EMAIL PROTECTED] wrote: Hello, i want to setup a webserver cluster with two nodes in an active-active configuration. I've a DNS name for the cluster: www.example.com. This name is resolved by DNS with the round-robin technique with two IP 10.0.0.1 and 10.0.0.2. I MUST use a heartbeat version 2.0.7 (Debian 4.0 Etch). Then you're in the wrong place... you need a Debian support list. 2.0.7 was released over two years ago and our desire to re-visit bugs we've already fixed is minimal. Its not even clear to me how, after re-finding the problem, we can provide you with a fix if you can't/won't upgrade. If you insist on using only what Debian provides, then we've no way to help you. I want to configure HB to achieve this: 1) On a normal situation (2 nodes running) each node must have one IP and one apache running. 2) If one apache is failed on one node the IP on this node must migrate to the remaning node. 3) When a node that had failures is repaired then the IP and the Apache must return to run on that node. My first setup was: * Resources - IP1:IPaddr2(OCF) - IP2:IPaddr2(OCF) - WebServer(clone max:2 node_max:1):apache(OCF) * Costraints: - IP1_where_WebServer - IP2_Where_Webserver Initially the resource are equally balanced on the two nodes like this: * node1 - IP1 - WebServer_istance:0 * node2 - IP2 - WebServer_istance:1 When one webserver instance fails, the IP that runs on the same node doesn't migrate on the other node. This is not the behavior that i want. So I decided to try another setup: * Resources - Group1(ordered, collocated) IP1:IPaddr2(OCF) WebServer1:apache(OCF) - Group2(ordered, collocated) IP2:IPaddr2(OCF) WebServer2:apache(OCF) Initially the resource are equally balanced on the two nodes like this: * node1 - Group1 IP1 WebServer1 * node2 - Group2 IP2 WebServer2 When one webserver instance fails, the IP that runs on the same node migrate on the other node with the apache resource. This is a good approximation of what I want (the illusion of two running WebServers isn't pretty but it works). Now, to restore the migrated IP and WebServer i've reset the fail-counts of every resource but they don't come back to their original running node. This in not what i want. Only If I restart the service on the node where the resource was failed then the entire group migrate on the original node. There is anyone that could suggest me a better way to obtain what i need? Thanks in advance -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Two Apaches with two IP in a active active configuration
Hello, i want to setup a webserver cluster with two nodes in an active-active configuration. I've a DNS name for the cluster: www.example.com. This name is resolved by DNS with the round-robin technique with two IP 10.0.0.1 and 10.0.0.2. I MUST use a heartbeat version 2.0.7 (Debian 4.0 Etch). I want to configure HB to achieve this: 1) On a normal situation (2 nodes running) each node must have one IP and one apache running. 2) If one apache is failed on one node the IP on this node must migrate to the remaning node. 3) When a node that had failures is repaired then the IP and the Apache must return to run on that node. My first setup was: * Resources - IP1:IPaddr2(OCF) - IP2:IPaddr2(OCF) - WebServer(clone max:2 node_max:1):apache(OCF) * Costraints: - IP1_where_WebServer - IP2_Where_Webserver Initially the resource are equally balanced on the two nodes like this: * node1 - IP1 - WebServer_istance:0 * node2 - IP2 - WebServer_istance:1 When one webserver instance fails, the IP that runs on the same node doesn't migrate on the other node. This is not the behavior that i want. So I decided to try another setup: * Resources - Group1(ordered, collocated) IP1:IPaddr2(OCF) WebServer1:apache(OCF) - Group2(ordered, collocated) IP2:IPaddr2(OCF) WebServer2:apache(OCF) Initially the resource are equally balanced on the two nodes like this: * node1 - Group1 IP1 WebServer1 * node2 - Group2 IP2 WebServer2 When one webserver instance fails, the IP that runs on the same node migrate on the other node with the apache resource. This is a good approximation of what I want (the illusion of two running WebServers isn't pretty but it works). Now, to restore the migrated IP and WebServer i've reset the fail-counts of every resource but they don't come back to their original running node. This in not what i want. Only If I restart the service on the node where the resource was failed then the entire group migrate on the original node. There is anyone that could suggest me a better way to obtain what i need? Thanks in advance -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why the fail count isn' increased to a value 1?
There is any patch to apply to 2.1.3 that fix this problem? Il giorno gio, 27/11/2008 alle 23.00 +0100, Andrew Beekhof ha scritto: On Thu, Nov 27, 2008 at 17:47, Michele Codutti [EMAIL PROTECTED] wrote: Il giorno gio, 27/11/2008 alle 17.19 +0100, Francisco José Méndez Cirera ha scritto: Is a bug, you must install the last version. Do you mean the 2.1.4? Please dont'tell me so! I don't want to install software out of distribution! Then I suggest you contact your distribution for support. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Why the fail count isn' increased to a value 1?
Hallo, I'm testing heartbeat 2.1.3 (packaged by debian) and I see a strange behaviour of the failcount. According to http://www.linux-ha.org/ScoreCalculation the failcount is increased by 1 every time that my resource is failing. In my experience with heartbeat the fail count is increased by 1 only if the previous value was 0. My test was conducted in this way: I've configured a resource that is an instance of the IPaddr2, I've put the resource_stickiness=3 and the resource_failure_stickyness=-1 RA. The first time I've started the resource the CRM has choose a node (let's say node1) where to put the IP configured by IPaddr2. To test the failure behaviour I've delete (by hand) the IP from the interface configured by my resource and the monitor operation has detected the failure and restored the resource. I've checked that the score of IPaddr2 on the running node was 2 and the failcount was 1. Now, to test a second failure on the same node I deleted again the IP. Also this time the resource was restored and I expected that the score was 1 but the score was 2 and the value of the failcount was not incremented (failcount=1)! It's the normal behaviour of the failcount? There is any parameter on the configuration file or the cib.xml that I must configure to change this binary behaviour of the failcount? Thanks in advance -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why the fail count isn' increased to a value 1?
Il giorno gio, 27/11/2008 alle 17.19 +0100, Francisco José Méndez Cirera ha scritto: Is a bug, you must install the last version. Do you mean the 2.1.4? Please dont'tell me so! I don't want to install software out of distribution! -- Michele Codutti Centro Servizi Informatici e Telematici (CSIT) Universita' degli Studi di Udine via Delle Scienze, 208 - 33100 UDINE tel +39 0432 558928 fax +39 0432 558911 e-mail: michele.codutti at uniud.it ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems