On Tue, Apr 27, 2010 at 12:40:41PM -0700, [email protected] wrote:
> On Tue, Apr 27, 2010 at 09:06:46PM +0200, Jens Elkner wrote:
> > t...@150 (l...@150) terminated by signal SEGV (no mapping at the fault
> > address)
> > 0xff2570a8: t_splay+0x0010: ld [%o2 + 8], %o1
> > Current function is dec_argv
> > 1764 s = (char **)malloc((nelem + 1) * (sizeof *s));
> > current thread: t...@150
> > [1] t_splay(0x85a04, 0x0, 0x1fffff, 0x85808, 0x0, 0xff337480), at
> > 0xff2570a8
> > [2] t_delete(0x85a04, 0x1fc, 0x1fffff, 0xff256f30, 0xff3303a8, 0x0), at
> > 0xff256f30
> > [3] realfree(0x85800, 0x1ff, 0xd98dc, 0x8b7a0, 0x0, 0x8a768), at
> > 0xff256b44
> > [4] cleanfree(0x0, 0xe, 0xd902c, 0x0, 0xff3303a8, 0xff3392a4), at
> > 0xff2573cc
> > [5] _malloc_unlocked(0x28, 0x0, 0x0, 0x0, 0xfffffffc, 0x0), at 0xff256524
> > [6] malloc(0x24, 0x1, 0xd9fd8, 0x0, 0xff3303a8, 0xff33a518), at
> > 0xff256414
...
> > Is anybody able to spot, what's going wrong here?
>
> This looks like a classic case of heap corruption. You've died in
> t_splay while trying to coalese free blocks before performing an
> allocation. Sometimes this happens when an object is double-free'd,
> when you free an object that wasn't allocated by the allocator, or other
> similar mistakes.
Direct hit! Found it in the milter code (i.e. not in libmilter as
assumed): a struct was malloced but one char* member was overseen when
initializing it (not set to NULL). So in the cleanup function of the
milter the chaos started: if (cf->helo) free(cf->helo);
> Can you run this application under libumem? It has a bunch of debugging
> features that might help you out here.
>
> This is what I typically use when trying to debug heap corruption. What
> follows is for a 32-bit application. Omit the _32 if you're 64-bit.
>
> LD_PRELOAD_32=libumem.so
> UMEM_DEBUG='audit=50,guards,contents'
> UMEM_LOGGING='transaction,fail,contents'
>
> export LD_PRELOAD_32 UMEM_DEBUG UMEM_LOGGING
>
> HTH,
Yes - really, really cool stuff! dbx pointed immediately to the line shown
above. Everything else was a matter of 5 minutes. Have it running on two
machines now for about 2.5 hours without any problem!
Thanx a lot (incl. all others for the given hints)!
BTW: bcheck seems to work on that machine (not sure, what output one
should expect, but 4me it looks ok), but AFAICS didn't discover suspicious
stuff:
Actual leaks report (actual leaks: 0 total size: 0
bytes)
Possible leaks report (possible leaks: 0 total size: 0
bytes)
Blocks in use report (blocks in use: 127 total size: 14430
bytes)
Total % of Num of Avg Allocation call stack
Size All Blocks Size
========== ==== ====== ====== =======================================
10428 72% 1 10428 get_zone < getsystemTZ < _localtime_r < ctime <
main
3136 21% 56 56 optadd < optparse
672 4% 53 12 _strdup < optadd
81 <1% 5 16 _strdup < optaddarg
54 <1% 7 7 _strdup < optadd
31 <1% 4 7 _strdup < optadd
28 <1% 1 28 _strdup < optaddarg < optparse < main
Regards,
jel.
--
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code