> Okay, cvs HEAD, fresh build starting with extraclean, just plain
> APR_POOL_DEBUG:
>
> #0 0xff34c70c in pool_is_child_of (pool=0x1b3fb0, parent=0x6174652c,
> mutex=0x0) at apr_pools.c:900
^^^^^^^^^
So, the parent of 'parent' is a pool which doesn't need a lock. It was
created by an apr_pool_create_ex call with APR_POOL_NEWALLOCATOR passed in.
It's a direct descendant of the global_pool, which means NULL is passed
in as the parent argument.
I'm wondering how the pool hierarchy got currupted.
> 900 if (parent->mutex && parent->mutex != mutex)
> (gdb) where
> #0 0xff34c70c in pool_is_child_of (pool=0x1b3fb0, parent=0x6174652c,
> mutex=0x0) at apr_pools.c:900
> #1 0xff34c794 in pool_is_child_of (pool=0x1b3fb0, parent=0x1d2ca0,
> mutex=0x0) at apr_pools.c:906
> #2 0xff34c794 in pool_is_child_of (pool=0x1b3fb0, parent=0x14a528,
> mutex=0x117aa8) at apr_pools.c:906
> #3 0xff34c794 in pool_is_child_of (pool=0x1b3fb0, parent=0x117a60,
> mutex=0x0) at apr_pools.c:906
> #4 0xff34c8dc in check_integrity (pool=0x1b3fb0) at apr_pools.c:950
> #5 0xff34c910 in apr_palloc (pool=0x1b3fb0, size=19) at
> apr_pools.c:977
> #6 0xff32bb10 in apr_pstrndup (a=0x1b3fb0, s=0xfb1077f5
> "index.html.tw.Big5\n", n=18)
> at apr_strings.c:96
[...]
> (gdb) p *parent
> Cannot access memory at address 0x6174652c
> (gdb)
Which proves the corruption. The pool hierarchy is traversed, top down,
while locking if needed (like told by the app), checking if the pool
passed into the function is in the hierarchy. If not, it aborts. If
it segfaults, the hierarchy must be corrupted.
Am my missing something here?
[...]
> Here is the coredump I get (almost immediately after starting
> pounding):
>
> #0 0xff34d994 in pool_num_bytes (pool=0x14a3c8) at apr_pools.c:1381
> 1381 for (index = 0; index < node->index; index++) {
> (gdb) where
> #0 0xff34d994 in pool_num_bytes (pool=0x14a3c8) at apr_pools.c:1381
> #1 0xff34da44 in apr_pool_num_bytes (pool=0x14a3c8, recurse=1) at
> apr_pools.c:1395
> #2 0xff34da8c in apr_pool_num_bytes (pool=0x14a3c8, recurse=1) at
> apr_pools.c:1401
> #3 0xff34da8c in apr_pool_num_bytes (pool=0x190968, recurse=1) at
> apr_pools.c:1401
This could be the same locking problem as with the integrity check,
which I will fix. What I am wondering about is if pool=0x190968
is the global pool or the pool being destroyed. Do you have that
information?
> #4 0xff34cf7c in apr_pool_destroy_debug (pool=0x160d08,
> file_line=0xff34ed98 "thread.c:182")
> at apr_pools.c:1091
> #5 0xff340f2c in apr_thread_exit (thd=0x163f28, retval=0) at
> thread.c:182
> #6 0xa15bc in worker_thread (thd=0x163f28, dummy=0x1bb120) at
> worker.c:788
> #7 0xff340cc0 in dummy_worker (opaque=0x163f28) at thread.c:122
> (gdb) p *node
> Cannot access memory at address 0x208
> (gdb) p *pool
> $1 = {parent = 0x0, child = 0x0, sibling = 0x2077a8, ref = 0x19096c,
> cleanups = 0x1cb810,
> subprocesses = 0x0, abort_fn = 0, user_data = 0x0, tag = 0xf7240
> "protocol.c:575", nodes = 0x0,
> file_line = 0xf7240 "protocol.c:575", stat_alloc = 0,
> stat_total_alloc = 373, stat_clear = 1,
> mutex = 0x0}
> (gdb)
>
> The only POOL DEBUG messages I get in the error_log are CLEAR (just
> two, at initialization), CREATE, and DESTROY. I'd expect some other
> flavor if the pool code noticed a problem, right?
Could you send me the first bit of the log (up to ~1MB) in private?
> Thanks,
>
> Jeff
Sander