On 02.10.2010 22:29, Jeff Trawick wrote:
On Sat, Oct 2, 2010 at 7:23 AM, Rainer Jung<rainer.j...@kippdata.de>  wrote:

All builds suceeded, all make check ran fine, except for two cases on
Solaris 10 (Niagara). I reran the tests there and couldn't reproduce the
problem. Tests now running in a loop, so far not reproducible.
...
Details on Solaris 10 test failures

- both in testreslist
- retried both tests more than 100 times, could not reproduce
- build against apr 1.4.2

I built apr-util-1.3.9 and 1.3.10 against apr 1.3.x HEAD on S10/x86
with SunStudio and haven't reproduced any glitches

potentially Niagra presents some scheduling sequences that most of us
aren't going to see

reslist is unchanged in 1.3.10 but thread_pool has several
modifications; to my eye those changes look exceedingly safe;

also: added asserts to all the unchecked mutex calls in threadpool and
reran testreslist but no hits

ran testall under valgrind on Linux and got a hit at exit():

==7911== Invalid free() / delete / delete[]
==7911==    at 0x4024B3A: free (vg_replace_malloc.c:366)
==7911==    by 0x4272653: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==7911==    by 0x4272119: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==7911==    by 0x401F4F3: _vgnU_freeres (vg_preloaded.c:62)
==7911==    by 0x41ED033: _Exit (_exit.S:30)
==7911==    by 0x418422E: exit (exit.c:100)
==7911==    by 0x416BBDD: (below main) (libc-start.c:258)
==7911==  Address 0x449d698 is not stack'd, malloc'd or (recently) free'd

but same error running apr-util 1.3.9 against same apr

So your failures are scary, but&^%$ happens :(

Thanks for further investigating. I also have the impression the failures should not be a regression, but never tested similarly extensive before. I'm still +1 on the release.

I ran testreslist 1000 times for one build and about 500 times for another build (different versions of expat and Berkeley DB), both on Niagara. I couldn't reproduce the Bus error, but I had

- once a process hanging polling, but it crashed after I detached the debugger

- and now one looping again in apr_pool_cleanup_kill() with c == c->next.

So if we have a good idea how to investigate further, it seems I can reproduce the looping with a little patience.

Regards,

Rainer


Reply via email to