Simon Horman a écrit :
On Wed, Apr 25, 2007 at 04:25:48PM +0200, Dejan Muhamedagic wrote:
On Wed, Apr 25, 2007 at 11:59:02AM +0200, Benjamin Watine wrote:
You were true, it wasn't a score problem, but my IPv6 resource that
causes an error, and let the resource group unstarted.
Without IPv6, all is OK, behaviour of Heartbeat fit my needs (start on
prefered node (castor), and failover after 3 fails). So, my problem is
IPv6 now.
The script seems to have a problem :
# /etc/ha.d/resource.d/IPv6addr 2001:660:6301:301::47:1 start
*** glibc detected *** free(): invalid next size (fast):
0x000000000050d340 ***
/etc/ha.d/resource.d//hto-mapfuncs: line 51: 4764 Aborted
$__SCRIPT_NAME start
2007/04/25_11:43:29 ERROR: Unknown error: 134
ERROR: Unknown error: 134
but now, ifconfig show that IPv6 is well configured, but script exit
with error code.
IPv6addr aborts, hence the exit code 134 (128+signo). Somebody
recently posted a set of patches for IPv6addr... Right, I'm cc-ing
this to Horms.
Hi,
thanks for CCing me on this, I don't peruse the linux-ha list very often
and I certainly would have missed it otherwise.
Looking over the patches that I applied to IPv6addr recently,
the following two fix potential crash bugs, though I don't think
either of them relate to free() calls, so I doubt that they will resolve
your problem.
http://hg.linux-ha.org/dev/rev/37271ae7f117
http://hg.linux-ha.org/dev/rev/b4bc188b4ebe
I did however find a crash bug relating to free in the version of
libnet that I was using. You can find a fairly lenthy discussion and
a proposed fix at:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=418975
In summary. On Debian Etch, the problem resulted in a crash on amd64.
It did not manifest in a crash on i386. I will raise this issue with
the upstream libnet maintainer, as I think that the problem is present
in the latest versions of his code.
Assuming that this does not solve your problem, what would help me
imensely is the following information.
I use libnet v1.1.2.1 and I've applied your patch, but it don't solve my
problem.
1) What version of linux-ha and libnet you are using
and where you got them from.
Heartbeat v2.0.8 x86_64 from CentOS package
(http://mirror.centos.org/centos/4/extras/x86_64/RPMS/) before, but now
Heartbeat v2.0.8 from sources
(http://linux-ha.org/download/heartbeat-2.0.8.tar.gz)
Libnet v1.1.2.1 (latest stable) from http://www.packetfactory.net/libnet/
2) What architecture you are using.
I'm running on RedHat ES4 x86_64
3) If you could provide a backtrace of the crash, preferably using
versions of linux-ha and libnet that have been recompiled with
debuging symbols. (In the general case this means adding -g to
CFLAGS, then rebuilding from scratch, including rerunning ./configure).
I've rebuilded Heartbeat from sources, enabled debugging (-g option was
already in CFLAGS if I don't make mistake), but I don't know how to do a
backtrace :/
I've tried to do :
gdb /usr/lib/ocf/resource.d/heartbeat/IPv6addr
run 2001:660:6301:301::47:1 start
Starting program: /usr/lib/ocf/resource.d/heartbeat/IPv6addr
2001:660:6301:301::47:1 start
[Thread debugging using libthread_db enabled]
[New Thread 47165808758720 (LWP 4360)]
usage: /usr/lib/ocf/resource.d/heartbeat/IPv6addr
{start|stop|status|monitor|validate-all|meta-data}
Program exited with code 02.
What is the usage of executable IPv6addr ? It's ok for its resource
agent (/etc/ha.d/resource.d/IPv6addr (IPv6) start), but not for the
executable. How can I do the backtrace of IPv6addr ?
4) Please Cc me on mail regarding this :)
done :)
Thanks !
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems