Simon Horman a écrit :
On Wed, Apr 25, 2007 at 04:25:48PM +0200, Dejan Muhamedagic wrote:
On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote:
You were true, it wasn't a score problem, but my IPv6 resource that causes an error, and let the resource group unstarted.

Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on prefered node (castor), and failover after 3 fails). So, my problem is IPv6 now.

The script seems to have a problem :

# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast): 0x000000000050d340 *** /etc/ha.d/resource.d//hto-mapfuncs: line 51: 4764 Aborted $__SCRIPT_NAME start
2007/04/25_11:43:29 ERROR:  Unknown error: 134
ERROR:  Unknown error: 134

but now, ifconfig show that IPv6 is well configured, but script exit with error code.
IPv6addr aborts, hence the exit code 134 (128+signo). Somebody
recently posted a set of patches for IPv6addr... Right, I'm cc-ing
this to Horms.

Hi,

thanks for CCing me on this, I don't peruse the linux-ha list very often
and I certainly would have missed it otherwise.

Looking over the patches that I applied to IPv6addr recently,
the following two fix potential crash bugs, though I don't think
either of them relate to free() calls, so I doubt that they will resolve
your problem.

http://hg.linux-ha.org/dev/rev/37271ae7f117
http://hg.linux-ha.org/dev/rev/b4bc188b4ebe

I did however find a crash bug relating to free in the version of
libnet that I was using. You can find a fairly lenthy discussion and
a proposed fix at:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418975

In summary. On Debian Etch, the problem resulted in a crash on amd64.
It did not manifest in a crash on i386.  I will raise this issue with
the upstream libnet maintainer, as I think that the problem is present
in the latest versions of his code.

Assuming that this does not solve your problem, what would help me
imensely is the following information.


I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my problem.

1) What version of linux-ha and libnet you are using
   and where you got them from.

Heartbeat v2.0.8 x86_64 from CentOS package (http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now Heartbeat v2.0.8 from sources (http://linux-ha.org/download/heartbeat-2.0.8.tar.gz)

Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/

2) What architecture you are using.

I'm running on RedHat ES4 x86_64

3) If you could provide a backtrace of the crash, preferably using
   versions of linux-ha and libnet that have been recompiled with
   debuging symbols.  (In the general case this means adding -g to
   CFLAGS, then rebuilding from scratch, including rerunning ./configure).

I've rebuilded Heartbeat from sources, enabled debugging (-g option was already in CFLAGS if I don't make mistake), but I don't know how to do a backtrace :/

I've tried to do :

gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr
run 2001:660:6301:301::47:1 start
Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr 2001:660:6301:301::47:1 start
[Thread debugging using libthread_db enabled]
[New Thread 47165808758720 (LWP 4360)]
usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr {start|stop|status|monitor|validate-all|meta-data}

Program exited with code 02.

What is the usage of executable IPv6addr ? It's ok for its resource agent (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the executable. How can I do the backtrace of IPv6addr ?

4) Please Cc me on mail regarding this :)


done :)

Thanks !
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to