On 02.10.2010 22:29, Jeff Trawick wrote:
On Sat, Oct 2, 2010 at 7:23 AM, Rainer Jung<rainer.j...@kippdata.de> wrote:
All builds suceeded, all make check ran fine, except for two cases on
Solaris 10 (Niagara). I reran the tests there and couldn't reproduce the
problem. Tests now running in a loop, so far not reproducible.
...
Details on Solaris 10 test failures
- both in testreslist
- retried both tests more than 100 times, could not reproduce
- build against apr 1.4.2
I built apr-util-1.3.9 and 1.3.10 against apr 1.3.x HEAD on S10/x86
with SunStudio and haven't reproduced any glitches
potentially Niagra presents some scheduling sequences that most of us
aren't going to see
reslist is unchanged in 1.3.10 but thread_pool has several
modifications; to my eye those changes look exceedingly safe;
also: added asserts to all the unchecked mutex calls in threadpool and
reran testreslist but no hits
ran testall under valgrind on Linux and got a hit at exit():
==7911== Invalid free() / delete / delete[]
==7911== at 0x4024B3A: free (vg_replace_malloc.c:366)
==7911== by 0x4272653: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==7911== by 0x4272119: ??? (in /lib/tls/i686/cmov/libc-2.11.1.so)
==7911== by 0x401F4F3: _vgnU_freeres (vg_preloaded.c:62)
==7911== by 0x41ED033: _Exit (_exit.S:30)
==7911== by 0x418422E: exit (exit.c:100)
==7911== by 0x416BBDD: (below main) (libc-start.c:258)
==7911== Address 0x449d698 is not stack'd, malloc'd or (recently) free'd
but same error running apr-util 1.3.9 against same apr
So your failures are scary, but&^%$ happens :(
Thanks for further investigating. I also have the impression the
failures should not be a regression, but never tested similarly
extensive before. I'm still +1 on the release.
I ran testreslist 1000 times for one build and about 500 times for
another build (different versions of expat and Berkeley DB), both on
Niagara. I couldn't reproduce the Bus error, but I had
- once a process hanging polling, but it crashed after I detached the
debugger
- and now one looping again in apr_pool_cleanup_kill() with c == c->next.
So if we have a good idea how to investigate further, it seems I can
reproduce the looping with a little patience.
Regards,
Rainer