Aurelien and Brian. Thanks for the suggestions. I reran the runs with --without-memory-manager and got (on 2 of 5000 runs): *** glibc detected *** corrupted double-linked list: 0xf704dff8 *** on one and *** glibc detected *** malloc(): memory corruption: 0xeda00c70 *** on the other.
So it looks like somewhere we are over-running our allocated space. So now I am attempting to redo the run with valgrind. Tim On Thursday 20 September 2007 09:59:14 pm Brian Barrett wrote: > On Sep 20, 2007, at 7:02 AM, Tim Prins wrote: > > In our nightly runs with the trunk I have started seeing cases > > where we > > appear to be segfaulting within/below malloc. Below is a typical > > output. > > > > Note that this appears to only happen on the trunk, when we use > > openib, > > and are in 32 bit mode. It seems to happen randomly at a very low > > frequency (59 out of about 60,000 32 bit openib runs). > > > > This could be a problem with our machine, and has showed up since I > > started testing 32bit ofed 10 days ago. > > > > Anyways, just curious if anyone had any ideas. > > As someone else said, this usually points to a duplicate free or the > like in malloc. You might want to try compiling with --without- > memory-manager, as the ptmalloc2 in glibc frequently is more verbose > about where errors occurred than is the one in Open MPI. > > Brian > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel