Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote: - Forwarded message from Simon Horman [EMAIL PROTECTED] - Date: Mon, 23 Apr 2007 11:25:36 +0900 From: Simon Horman [EMAIL PROTECTED] To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2 Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: [EMAIL PROTECTED] User-Agent: mutt-ng/devel-r804 (Debian) Status: RO Content-Length: 567 Lines: 20 On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal The IPAddr2 script contains bashisms. /usr/lib/ocf/resource.d/heartbeat/IPaddr2: IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2} Hotfix: replace /bin/sh in the first line by /bin/bash Other scripts might be affected as well. Thanks, I'll get this fixed. Please let me know if you find any more. i'll push up a fix momentarily ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote: On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote: - Forwarded message from Simon Horman [EMAIL PROTECTED] - Date: Mon, 23 Apr 2007 11:25:36 +0900 From: Simon Horman [EMAIL PROTECTED] To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2 Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: [EMAIL PROTECTED] User-Agent: mutt-ng/devel-r804 (Debian) Status: RO Content-Length: 567 Lines: 20 On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal The IPAddr2 script contains bashisms. /usr/lib/ocf/resource.d/heartbeat/IPaddr2: IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2} Hotfix: replace /bin/sh in the first line by /bin/bash Other scripts might be affected as well. Thanks, I'll get this fixed. Please let me know if you find any more. i'll push up a fix momentarily Since IPaddr2 is Linux specific, I guess it's OK to have it run by bash. ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
On 4/26/07, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote: On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote: - Forwarded message from Simon Horman [EMAIL PROTECTED] - Date: Mon, 23 Apr 2007 11:25:36 +0900 From: Simon Horman [EMAIL PROTECTED] To: Erich Schubert [EMAIL PROTECTED], [EMAIL PROTECTED] Subject: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2 Message-ID: [EMAIL PROTECTED] References: [EMAIL PROTECTED] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: [EMAIL PROTECTED] User-Agent: mutt-ng/devel-r804 (Debian) Status: RO Content-Length: 567 Lines: 20 On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal The IPAddr2 script contains bashisms. /usr/lib/ocf/resource.d/heartbeat/IPaddr2: IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2} Hotfix: replace /bin/sh in the first line by /bin/bash Other scripts might be affected as well. Thanks, I'll get this fixed. Please let me know if you find any more. i'll push up a fix momentarily Since IPaddr2 is Linux specific, I guess it's OK to have it run by bash. i believe bash is available for most platforms its even the default on OSX (a BSD variant) ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Re: Bug#420637: heartbeat-2: File descriptor leak?
Hi, On Thu, Apr 26, 2007 at 11:14:46AM +0900, Simon Horman wrote: On Tue, Apr 24, 2007 at 09:51:45AM +0900, Simon Horman wrote: forwarded 420637 [EMAIL PROTECTED] thanks On Mon, Apr 23, 2007 at 07:28:53PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal It seems that heartbeat-2 leaks a file descriptor to it's child processes. From the SELinux audit log: avc: denied { read } for pid=2403 comm=ip name=heartbeat.pid dev=ida/c0d0p5 ino=86181 scontext=root:system_r:ifconfig_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file avc: denied { read } for pid=3210 comm=rndc name=heartbeat.pid dev=ida/c0d0p5 ino=86181 scontext=root:system_r:ndc_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file avc: denied { read } for pid=3303 comm=openvpn name=heartbeat.pid dev=ida/c0d0p5 ino=86181 scontext=root:system_r:openvpn_t:s0 tcontext=system_u:object_r:initrc_var_run_t:s0 tclass=file I don't speak SElinux: comm= denotes a program? I suppose that ip is from IPaddr2 then. Do you have openvpn and bind in your heartbeat config? Perhaps you could also post your heartbeat configuration (ha.cf and haresources/cib.xml). Thanks. The best explanaition for these errors I have is that a file descriptor (such as STDIN) of these processes points to the heartbeat.pid file. I havn't verified it in the heartbeat-2 code yet. It's not very likely that this is exploitable; the heartbeat scripts are started with root privileges anyway. But in theory it could be possible to trick one of these scripts into writing a differend PID into the pidfile maybe? Hi Eric, that does indeed look like a bit of a problem. Thanks for reporting it. Hopefully it isn't too hard to track down and fix. I'm CCing the linux-ha-dev list so their eyes pass over this problem. Re CCing, as I used the wrong address the first time around. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
On Thu, 26 Apr 2007, Dejan Muhamedagic wrote: On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote: On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote: - Forwarded message from Simon Horman [EMAIL PROTECTED] - [...] On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal The IPAddr2 script contains bashisms. /usr/lib/ocf/resource.d/heartbeat/IPaddr2: IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2} Hotfix: replace /bin/sh in the first line by /bin/bash Other scripts might be affected as well. Thanks, I'll get this fixed. Please let me know if you find any more. i'll push up a fix momentarily Since IPaddr2 is Linux specific, I guess it's OK to have it run by bash. Executive summary of what follows: a cautious agreement with that: Since IPaddr2 is Linux specific, I guess it's OK to have ... bash. Now the waffle: feel free to hit delete: A personal view (coming from a portability angle and a Solaris angle): In general, I would usually argue for Bourne-only, avoiding bash where reasonably possible, in line with GNU portability recommendations. But if IPaddr2 really is Linux specific then I'd be OK with bash in this defined instance, if it makes the insides of the script significantly cleaner and clearer (more understandable and more maintainable). The GNU portability purists would argue for Bourne, and discourage bash, based on the fact that every UN*X-like OS has Bourne, but only some have bash. Personally, I try to follow that where reasonably possible, to keep things portable, including in heartbeat. As a counter-example: They would also argue against using shell functions because (apparently) some Bournes lack them. Personally, I don't bother following that one, including in heartbeat, because most real world Bournes these days seem to have shell functions. Indeed, I've added some myself to heartbeat down the years (and in an email earlier this week, I suggested adding another). Solaris? When I started with heartbeat, Solaris versions of the time lacked native bash. These days, Solaris distributions include bash, (although not necessarily within the default installation set but at least these days it's easily installable). So in the heartbeat context nowadays, I would usually continue to advise against bash-isms where reasonably possible (for OSes that may still lack native (or natively available) bash) but in favour of shell functions (because they tend to add significant clarity, with no apparent loss to likely OSes). In the case of a known Linux-only script, then bash is probably OK if its use adds value (clarity, maintainability, etc.). (Hope you don't mind that rambling piece of background waffle!) -- : David LeeI.T. Service : : Senior Systems ProgrammerComputer Centre : : UNIX Team Leader Durham University : : South Road: : http://www.dur.ac.uk/t.d.lee/Durham DH1 3LE: : Phone: +44 191 334 2752 U.K. : ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Fwd: Re: Bug#420206: heartbeat-2: Bashism in IPAddr2
On Thu, Apr 26, 2007 at 04:42:13PM +0100, David Lee wrote: On Thu, 26 Apr 2007, Dejan Muhamedagic wrote: On Thu, Apr 26, 2007 at 10:00:10AM +0200, Andrew Beekhof wrote: On 4/26/07, Simon Horman [EMAIL PROTECTED] wrote: - Forwarded message from Simon Horman [EMAIL PROTECTED] - [...] On Fri, Apr 20, 2007 at 08:38:59PM +0200, Erich Schubert wrote: Package: heartbeat-2 Version: 2.0.7-2 Severity: normal The IPAddr2 script contains bashisms. /usr/lib/ocf/resource.d/heartbeat/IPaddr2: IF_MAC=${IF_MAC:0:2}:${IF_MAC:2:2}:${IF_MAC:4:2}:${IF_MAC:6:2}:${IF_MAC:8:2}:${IF_MAC:10:2} Hotfix: replace /bin/sh in the first line by /bin/bash Other scripts might be affected as well. Thanks, I'll get this fixed. Please let me know if you find any more. i'll push up a fix momentarily Since IPaddr2 is Linux specific, I guess it's OK to have it run by bash. Executive summary of what follows: a cautious agreement with that: Since IPaddr2 is Linux specific, I guess it's OK to have ... bash. Now the waffle: feel free to hit delete: A personal view (coming from a portability angle and a Solaris angle): In general, I would usually argue for Bourne-only, avoiding bash where reasonably possible, in line with GNU portability recommendations. But if IPaddr2 really is Linux specific then I'd be OK with bash in this defined instance, if it makes the insides of the script significantly cleaner and clearer (more understandable and more maintainable). The GNU portability purists would argue for Bourne, and discourage bash, based on the fact that every UN*X-like OS has Bourne, but only some have bash. Personally, I try to follow that where reasonably possible, to keep things portable, including in heartbeat. As a counter-example: They would also argue against using shell functions because (apparently) some Bournes lack them. Personally, I don't bother following that one, including in heartbeat, because most real world Bournes these days seem to have shell functions. Indeed, I've added some myself to heartbeat down the years (and in an email earlier this week, I suggested adding another). Solaris? When I started with heartbeat, Solaris versions of the time lacked native bash. These days, Solaris distributions include bash, (although not necessarily within the default installation set but at least these days it's easily installable). So in the heartbeat context nowadays, I would usually continue to advise against bash-isms where reasonably possible (for OSes that may still lack native (or natively available) bash) but in favour of shell functions (because they tend to add significant clarity, with no apparent loss to likely OSes). In the case of a known Linux-only script, then bash is probably OK if its use adds value (clarity, maintainability, etc.). (Hope you don't mind that rambling piece of background waffle!) No, not at all. I think that this is important and I'd second your opinion. This is not meant as an argument, but I don't even understand the line above, just guess that it's something about splitting something into something. However, that script is, I believe, useable just on Linux. One thing which I'm really missing is variables local to a function (typeset or local). On the one hand, it is easy to mix up variable names and on the other very tedious to keep track of all the variables in a long script. It is not clear which standard should be followed. I think that there's also something like POSIX shell, but no idea how widespread. -- : David LeeI.T. Service : : Senior Systems ProgrammerComputer Centre : : UNIX Team Leader Durham University : : South Road: : http://www.dur.ac.uk/t.d.lee/Durham DH1 3LE: : Phone: +44 191 334 2752 U.K. : ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] How to add op to existing master/slave tag from command line
On 4/25/07, Doug Knight [EMAIL PROTECTED] wrote: Can someone provide an example of xml to be used with cibadmin to add an op tag to an existing master/slave resource? Here's my master/slave definition: master_slave notify=true id=ms_drbd_7788 instance_attributes id=ms_drbd_7788_instance_attrs attributes nvpair id=ms_drbd_7788_clone_max name=clone_max value=2/ nvpair id=ms_drbd_7788_clone_node_max name=clone_node_max value=1/ nvpair id=ms_drbd_7788_master_max name=master_max value=1/ nvpair id=ms_drbd_7788_master_node_max name=master_node_max value=1/ nvpair name=target_role id=ms_drbd_7788_target_role value=stopped/ /attributes /instance_attributes primitive class=ocf type=drbd provider=heartbeat id=rsc_drbd_7788 instance_attributes id=rsc_drbd_7788_instance_attrs attributes nvpair id=fdb586b1-d439-4dfb-867c-3eefbe5d585f name=drbd_resource value=pgsql/ nvpair name=target_role id=rsc_drbd_7788:0_target_role value=stopped/ /attributes /instance_attributes /primitive /master_slave And for example, I'd like to add: op id=drbd_mon_sl name=monitor timeout=60 role=Slave interval=30/ So that I can do: cibadmin -U -x add_mon_ssl.xml I've been trying to add it from the command line, and some of my attempts are core dumping with the following: crm_abort: crm_str_eq: Triggered fatal assert at utils.c:686 : a != b its rather hard to comment without * the contents of add_mon_ssl.xml * the stacktrace * the version Thanks, Doug ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Location constraints
Dejan Muhamedagic a écrit : On Wed, Apr 25, 2007 at 05:59:12PM +0200, Benjamin Watine wrote: Dejan Muhamedagic a écrit : On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote: You were true, it wasn't a score problem, but my IPv6 resource that causes an error, and let the resource group unstarted. Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on prefered node (castor), and failover after 3 fails). So, my problem is IPv6 now. The script seems to have a problem : # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start *** glibc detected *** free(): invalid next size (fast): 0x0050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 4764 Aborted $__SCRIPT_NAME start 2007/04/25_11:43:29 ERROR: Unknown error: 134 ERROR: Unknown error: 134 but now, ifconfig show that IPv6 is well configured, but script exit with error code. IPv6addr aborts, hence the exit code 134 (128+signo). Somebody recently posted a set of patches for IPv6addr... Right, I'm cc-ing this to Horms. Thank you so much, I'm waiting for Horms so. I'll take a look to list archive also. BTW, wasn't there also a core dump for this case too? Could you do a ls -R /var/lib/heartbeat/cores and check. I don't know how to find core dump :/ In this case, should it be core.22560 ? # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start *** glibc detected *** free(): invalid next size (fast): 0x0050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 22560 Aborted $__SCRIPT_NAME start 2007/04/26_10:46:38 ERROR: Unknown error: 134 ERROR: Unknown error: 134 [EMAIL PROTECTED] ls -R /var/lib/heartbeat/cores /var/lib/heartbeat/cores: hacluster nobody root /var/lib/heartbeat/cores/hacluster: core.3620 core.4116 core.4119 core.4123 core.5262 core.5265 core.5269 core.5272 core.3626 core.4117 core.4121 core.4124 core.5263 core.5266 core.5270 core.3829 core.4118 core.4122 core.5256 core.5264 core.5268 core.5271 /var/lib/heartbeat/cores/nobody: /var/lib/heartbeat/cores/root: core.10766 core.21816 core.29951 core.3642 core.3650 core.3658 core.3667 core.4471 core.11379 core.23505 core.30813 core.3643 core.3651 core.3661 core.3668 core.4550 core.11592 core.24403 core.31033 core.3645 core.3652 core.3663 core.4234 core.5104 core.12928 core.24863 core.3489 core.3647 core.3653 core.3664 core.4371 core.5761 core.15849 core.25786 core.3591 core.3648 core.3654 core.3665 core.4394 core.6130 core.21501 core.28286 core.3610 core.3649 core.3657 core.3666 core.4470 [EMAIL PROTECTED] # ifconfig eth0 Lien encap:Ethernet HWaddr 00:13:72:58:74:5F inet adr:193.48.169.46 Bcast:193.48.169.63 Masque:255.255.255.224 adr inet6: 2001:660:6301:301:213:72ff:fe58:745f/64 Scope:Global adr inet6: fe80::213:72ff:fe58:745f/64 Scope:Lien adr inet6: 2001:660:6301:301::47:1/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3788 errors:0 dropped:0 overruns:0 frame:0 TX packets:3992 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:450820 (440.2 KiB) TX bytes:844188 (824.4 KiB) Adresse de base:0xecc0 Mémoire:fe6e-fe70 And if I launch the script again, no error is returned : # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start 2007/04/25_11:45:23 INFO: Success INFO: Success So, you're saying that once the resource is running, starting it again doesn't produce an error? Did you also try to stop it and start it from the stopped state? Yes, but probably because the script just check that IPv6 is set, and so don't try to set it again. If I stop and start again, the error occurs. For others errors, I disable stonith for the moment, and DRBD is built in kernel, so the drbd module is not needed. I've seen this message, but it's not a problem. There's a small problem with the stonith suicide agent, which renders it unusable, but it is soon to be fixed. OK, that's what I had read on this list, but I wasn't sure. Is there is any patch now ? I joined log and config, and core file about stonith (/var/lib/heartbeat/cores/root/core.3668). Is it what you asked for (backtrace from stonith core dump) ? You shouldn't be sending core dumps to a public list: it may contain sensitive information. What I asked for, a backtrace, you get like this: $ gdb /usr/lib64/heartbeat/stonithd core.3668 (gdb) bt ... here comes the backtrace (gdb) quit Ooops ! Here it is : #0 0x0039b9d03507 in stonith_free_hostlist () from /usr/lib64/libstonith.so.1 #1 0x00408a95 in ?? () #2 0x00407fee in ?? () #3 0x004073c3 in ?? () #4 0x0040539d in ?? () #5 0x00405015 in ?? () #6 0x0039b950abd4 in G_CH_dispatch_int () from /usr/lib64/libplumb.so.1 #7 0x003a12a266bd in g_main_context_dispatch () from
Re: [Linux-HA] Location constraints
Simon Horman a écrit : On Wed, Apr 25, 2007 at 04:25:48PM +0200, Dejan Muhamedagic wrote: On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote: You were true, it wasn't a score problem, but my IPv6 resource that causes an error, and let the resource group unstarted. Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on prefered node (castor), and failover after 3 fails). So, my problem is IPv6 now. The script seems to have a problem : # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start *** glibc detected *** free(): invalid next size (fast): 0x0050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 4764 Aborted $__SCRIPT_NAME start 2007/04/25_11:43:29 ERROR: Unknown error: 134 ERROR: Unknown error: 134 but now, ifconfig show that IPv6 is well configured, but script exit with error code. IPv6addr aborts, hence the exit code 134 (128+signo). Somebody recently posted a set of patches for IPv6addr... Right, I'm cc-ing this to Horms. Hi, thanks for CCing me on this, I don't peruse the linux-ha list very often and I certainly would have missed it otherwise. Looking over the patches that I applied to IPv6addr recently, the following two fix potential crash bugs, though I don't think either of them relate to free() calls, so I doubt that they will resolve your problem. http://hg.linux-ha.org/dev/rev/37271ae7f117 http://hg.linux-ha.org/dev/rev/b4bc188b4ebe I did however find a crash bug relating to free in the version of libnet that I was using. You can find a fairly lenthy discussion and a proposed fix at: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418975 In summary. On Debian Etch, the problem resulted in a crash on amd64. It did not manifest in a crash on i386. I will raise this issue with the upstream libnet maintainer, as I think that the problem is present in the latest versions of his code. Assuming that this does not solve your problem, what would help me imensely is the following information. I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my problem. 1) What version of linux-ha and libnet you are using and where you got them from. Heartbeat v2.0.8 x86_64 from CentOS package (http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now Heartbeat v2.0.8 from sources (http://linux-ha.org/download/heartbeat-2.0.8.tar.gz) Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/ 2) What architecture you are using. I'm running on RedHat ES4 x86_64 3) If you could provide a backtrace of the crash, preferably using versions of linux-ha and libnet that have been recompiled with debuging symbols. (In the general case this means adding -g to CFLAGS, then rebuilding from scratch, including rerunning ./configure). I've rebuilded Heartbeat from sources, enabled debugging (-g option was already in CFLAGS if I don't make mistake), but I don't know how to do a backtrace :/ I've tried to do : gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr run 2001:660:6301:301::47:1 start Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 2001:660:6301:301::47:1 start [Thread debugging using libthread_db enabled] [New Thread 47165808758720 (LWP 4360)] usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr {start|stop|status|monitor|validate-all|meta-data} Program exited with code 02. What is the usage of executable IPv6addr ? It's ok for its resource agent (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the executable. How can I do the backtrace of IPv6addr ? 4) Please Cc me on mail regarding this :) done :) Thanks ! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Heartbeat and EVMS cluster
Hello, I'm trying to set up a cluster system with two machines and a shared storage (all SLES10 Heartbeat 2.0.7) For the shared storage there is an ISCSI target available to both machines and the management of this common device is done through EVMS. So far I managed to set up a private segment container using the cluster segment manager (CSM) in EVMS; for this container to be available heartbeat has to be started before the evms volumes can be activated, and the following two lines must be present in the ha.cf config file: respawn root /sbin/evmsd apiauth evms uid=hacluster,root Now the private segment container is available and accessible two one node only, that's how it is suppose to be. So far so good, but now my question is, how do I configure hearbeat to move this private segment container from one node to the other, is there an RA for this or am I going ahead of the project? I don't think that the filesystem type should be a problem since I'm using a private container, right? I know STONITH is important here, I'll take care of this later. A bit lost here, any helping hand? Kind regards Jose Jerez. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] mysql drbd and SAN all together
Andrew Beekhof wrote: On 4/25/07, Jan Kalcic [EMAIL PROTECTED] wrote: Hi, After some tests in my lab I have now a two nodes cluster working perfectly where I create a virtual ip resource using hb_gui. I also created a drbd partition which is correctly working but not yet included as heartbeat resource. This is the next step I'm going to do. Now, following the documentation and other I found a simple way to create a drbd resource for heartbeat which consists in a single line in the haresources file. But as far as I understood this is not longer used in heartbeat 2 where resources are configured in the cib.xml, right? I also noticed it's possible to configure it with hb_gui but I don't know the right steps to take and actually I get confused seeing different native resource for drbd. use haresources2cib.py to convert from the old version to the new If it worked it would be really great. The cib.xml file created during convertion has something wrong as I can not longer connect to the server from hb_gui. The line in haresources looks like: node1 IPaddr::192.168.1.93 drbddisk httpd The output in cib.xml is: ?xml version=1.0 ? cib configuration crm_config nvpair id=transition_idle_timeout name=transition_idle_timeout value=120s/ nvpair id=symmetric_cluster name=symmetric_cluster value=true/ nvpair id=no_quorum_policy name=no_quorum_policy value=stop/ /crm_config nodes/ resources group id=group_1 primitive class=ocf id=IPaddr_1 provider=heartbeat type=IPaddr operations op id=IPaddr_1_mon interval=5s name=monitor timeout=5s/ /operations instance_attributes attributes nvpair id=IPaddr_1_attr_0 name=ip value=192.168.1.93/ /attributes /instance_attributes /primitive primitive class=heartbeat id=drbddisk_2 provider=heartbeat type=drbddisk operations op id=drbddisk_2_mon interval=120s name=monitor timeout=60s/ /operations /primitive primitive class=heartbeat id=httpd_3 provider=heartbeat type=httpd operations op id=httpd_3_mon interval=120s name=monitor timeout=60s/ /operations /primitive /group /resources constraints rsc_location id=rsc_location_group_1 rsc=group_1 rule id=prefered_location_group_1 score=100 expression attribute=#uname id=prefered_location_group_1_expr operation=eq value=node1/ /rule /rsc_location /constraints /configuration status/ /cib I've also tried with the line below in haresources but it doesn't work anyway. nodo1 drbddisk::r0 Filesystem::/dev/drbd0::/drbdmount::ext3 192.168.0.1 httpd Does anybody have a similar configuration file working properly which can share? Jan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] fast hb_standby hb_takeover - lock
Hi, On Thu, Apr 26, 2007 at 09:32:42AM +0200, Hannes Dorbath wrote: Hello, I'm running Heartbeat 2.0.8 with a V1 style config. It works fine, besides a single thing, I'd like to get some clarification about: When I do hb_standby on machine A, and before the resource takeover completed hb_takeover again, I get the cluster in a situation where any standby or takeover requests are ignored on both sides. I get a message on machine B that it ignores the takeover request from machine A as resources are in flux. That makes sense, but after the resource takeover completed they still ignore requests. Boths sides display a timer of 3600 seconds or something before they will do anything again. What is the correct way to recover from that? Both sides refuse to accepts standby or takeover request, both sides refuse to stop Heartbeat. I need to kill Heartbeat and restart it, so that they are happy again. Any logs out there? Configuration perhaps? Thanks. Thanks. -- Regards, Hannes Dorbath ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Dejan ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] fast hb_standby hb_takeover - lock
On 26.04.2007 15:54, Dejan Muhamedagic wrote: Any logs out there? Configuration perhaps? I'll post both in 1-2 hours when I'm at that location again. I just thought this might be something known / expected. Thanks. -- Regards, Hannes Dorbath ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] IPv6addr fail
OK. I've lauched the IPv6addr again (standalone, and managed by HB), it crashes, but no core dump seems to be generated today. No file from today in core dir. I don't know why. The file command show me some old IPv6addr core dumps, you can find backtraces of it in the tar.gz generated by your script. I join the file root/* output for you can find files easily. All these core dumps are generated only by stonithd, IPv6addr, and pidof. Regards Ben Dejan Muhamedagic a écrit : On Thu, Apr 26, 2007 at 10:54:36AM +0200, Benjamin Watine wrote: Dejan Muhamedagic a écrit : On Wed, Apr 25, 2007 at 05:59:12PM +0200, Benjamin Watine wrote: Dejan Muhamedagic a écrit : On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote: You were true, it wasn't a score problem, but my IPv6 resource that causes an error, and let the resource group unstarted. Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on prefered node (castor), and failover after 3 fails). So, my problem is IPv6 now. The script seems to have a problem : # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start *** glibc detected *** free(): invalid next size (fast): 0x0050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 4764 Aborted $__SCRIPT_NAME start 2007/04/25_11:43:29 ERROR: Unknown error: 134 ERROR: Unknown error: 134 but now, ifconfig show that IPv6 is well configured, but script exit with error code. IPv6addr aborts, hence the exit code 134 (128+signo). Somebody recently posted a set of patches for IPv6addr... Right, I'm cc-ing this to Horms. Thank you so much, I'm waiting for Horms so. I'll take a look to list archive also. BTW, wasn't there also a core dump for this case too? Could you do a ls -R /var/lib/heartbeat/cores and check. I don't know how to find core dump :/ In this case, should it be core.22560 ? Some newer releases of file(1) show the program name which dumped the core: $ file core.6468 core.6468: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'gaim' Also, you can match the timestamps of core files and from the logs. # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start *** glibc detected *** free(): invalid next size (fast): 0x0050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 22560 Aborted $__SCRIPT_NAME start 2007/04/26_10:46:38 ERROR: Unknown error: 134 ERROR: Unknown error: 134 [EMAIL PROTECTED] ls -R /var/lib/heartbeat/cores /var/lib/heartbeat/cores: hacluster nobody root /var/lib/heartbeat/cores/hacluster: core.3620 core.4116 core.4119 core.4123 core.5262 core.5265 core.5269 core.5272 core.3626 core.4117 core.4121 core.4124 core.5263 core.5266 core.5270 core.3829 core.4118 core.4122 core.5256 core.5264 core.5268 core.5271 /var/lib/heartbeat/cores/nobody: /var/lib/heartbeat/cores/root: core.10766 core.21816 core.29951 core.3642 core.3650 core.3658 core.3667 core.4471 core.11379 core.23505 core.30813 core.3643 core.3651 core.3661 core.3668 core.4550 core.11592 core.24403 core.31033 core.3645 core.3652 core.3663 core.4234 core.5104 core.12928 core.24863 core.3489 core.3647 core.3653 core.3664 core.4371 core.5761 core.15849 core.25786 core.3591 core.3648 core.3654 core.3665 core.4394 core.6130 core.21501 core.28286 core.3610 core.3649 core.3657 core.3666 core.4470 [EMAIL PROTECTED] Well, you have quite a few. Let's hope that they stem from only those two errors. I'll attach a script which should generate all backtraces from your core files. It's been lightly tested but should work. # ifconfig eth0 Lien encap:Ethernet HWaddr 00:13:72:58:74:5F inet adr:193.48.169.46 Bcast:193.48.169.63 Masque:255.255.255.224 adr inet6: 2001:660:6301:301:213:72ff:fe58:745f/64 Scope:Global adr inet6: fe80::213:72ff:fe58:745f/64 Scope:Lien adr inet6: 2001:660:6301:301::47:1/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3788 errors:0 dropped:0 overruns:0 frame:0 TX packets:3992 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:450820 (440.2 KiB) TX bytes:844188 (824.4 KiB) Adresse de base:0xecc0 Mémoire:fe6e-fe70 And if I launch the script again, no error is returned : # /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start 2007/04/25_11:45:23 INFO: Success INFO: Success So, you're saying that once the resource is running, starting it again doesn't produce an error? Did you also try to stop it and start it from the stopped state? Yes, but probably because the script just check that IPv6 is set, and so don't try to set it again. If I stop and start again, the error occurs. For others errors, I disable stonith for the moment, and DRBD is built in kernel, so the drbd module is not needed. I've seen this message, but it's not a problem. There's a small problem
[Linux-HA] ERROR: parse_xml: Expected: action - HB 2.0.8
Error in /var/log/messages Apr 27 11:07:26 deneb crmd: [3038]: info: process_lrm_event: LRM operation resource_itsapaims_skel1_start_0 (call=134, rc=0) complete Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Expected: action Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token: Mismatching close tag Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before: /actions /resourc Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token: error parsing child Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before: / action name=mon Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error parsing token: error parsing child Apr 27 11:07:26 deneb crmd: [3038]: ERROR: parse_xml: Error at or before: actions action Apr 27 11:07:26 deneb crmd: [3038]: ERROR: crm_abort: find_xml_node: Triggered non-fatal assert at xml.c:77 : root != NULL Apr 27 11:07:26 deneb crmd: [3038]: ERROR: cl_get_value: wrong arugment (__name__) Apr 27 11:07:26 deneb crmd: [3038]: WARN: find_xml_node: Could not find actions in (null). Definition of the primitive primitive class=ocf type=itsapaims_ISskel provider=heartbeat restart_type=ignore id=resource_itsapaims_skel1 instance_attributes id=resource_itsapaims_skel1_instance_attrs attributes nvpair id=resource_itsapaims_skel1_user name=isskel_user value=sadmin/ nvpair id=resource_itsapaims_skel1_id name=isskel_id value=Skel_Server1/ nvpair id=resource_itsapaims_skel1_log name=isskel_log value=isskel1/ nvpair id=resource_itsapaims_skel1_sys name=isskel_sys value=FIDS/ nvpair id=resource_itsapaims_skel1_port name=isskel_port value=11431/ nvpair id=resource_itsapaims_skel1_config name=isskel_config value=/u/fids/data/isskel1.cfg/ /attributes /instance_attributes operations op id=skel1_itsapaims_skel1_mon interval=60s name=monitor timeout=60s on_fail=restart/ /operations instance_attributes id=resource_itsapaims_skel1 attributes nvpair name=is_managed id=resource_itsapaims_skel1-is_managed value=true/ /attributes /instance_attributes /primitive When I look in the output from 'cibadmin -Q' I don't see any actions tags. [EMAIL PROTECTED] hb]# cibadmin -Q | grep -i actions nvpair id=cib-bootstrap-options-stop-orphan-actions name=stop-orphan-actions value=true/ [EMAIL PROTECTED] hb]# cibadmin -Q | grep -i action nvpair id=cib-bootstrap-options-default-action-timeout name=default-action-timeout value=240/ nvpair id=cib-bootstrap-options-stonith-action name=stonith-action value=reboot/ nvpair id=cib-bootstrap-options-stop-orphan-actions name=stop-orphan-actions value=true/ rsc_order id=order_itsapaims_itsapaims1 from=resource_itsapaims1_aims to=group_itsapaims action=start type=before symmetrical=true/ rsc_order id=order_itsapaims_itsapaims2 from=resource_itsapaims2_aims to=group_itsapaims action=start type=before symmetrical=true/ Any ideas? Is the names too long? Alex saved-cib.xml.gz Description: GNU Zip compressed data ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Location constraints
On Thu, Apr 26, 2007 at 12:08:29PM +0200, Benjamin Watine wrote: Simon Horman a écrit : [snip] I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my problem. 1) What version of linux-ha and libnet you are using and where you got them from. Heartbeat v2.0.8 x86_64 from CentOS package (http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now Heartbeat v2.0.8 from sources (http://linux-ha.org/download/heartbeat-2.0.8.tar.gz) Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/ 2) What architecture you are using. I'm running on RedHat ES4 x86_64 3) If you could provide a backtrace of the crash, preferably using versions of linux-ha and libnet that have been recompiled with debuging symbols. (In the general case this means adding -g to CFLAGS, then rebuilding from scratch, including rerunning ./configure). I've rebuilded Heartbeat from sources, enabled debugging (-g option was already in CFLAGS if I don't make mistake), but I don't know how to do a backtrace :/ I've tried to do : gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr run 2001:660:6301:301::47:1 start Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 2001:660:6301:301::47:1 start [Thread debugging using libthread_db enabled] [New Thread 47165808758720 (LWP 4360)] usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr {start|stop|status|monitor|validate-all|meta-data} Program exited with code 02. What is the usage of executable IPv6addr ? It's ok for its resource agent (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the executable. How can I do the backtrace of IPv6addr ? Hi, thanks for taking some more time to look into this. The address is passed using the environment variable OCF_RESKEY_ipv6addr, so you want to run something like: OCF_RESKEY_ipv6addr=2001:660:6301:301::47:1 gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr (gdb) run start If this doesn't provide any intersting information, valgrind often does. OCF_RESKEY_ipv6addr=2001:660:6301:301::47:1 valgrind /usr/lib/ocf/resource.d/heartbeat/IPv6addr start Though I did put some effort into getting rid of the valgrind errors that I saw, and those problems should be resolved in the unstable tree. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems