Ralph Castain wrote:
It looks like the SM revisions we inserted into 1.3.2 are a great
detector for shared memory init failures - it segfaulted 143 times
last night on IU's sif computer, 34 times on Sun/Linux, and 3 times
on Sun/SunOS...almost every single time due to "Address not mapped"
errors in the sm btl during init.
Might be worth someone looking at the MTT output stack traces -this
is something that now appears to be reproducible, and should be
addressed before we release 1.3.2 as it seems far more likely to
happen than in the past.
Okay. I look at http://www.open-mpi.org/mtt/index.php?do_redir=973
If we start with the 3 Sun/SunOS failures (row #7), these seem to be
labeled 1.3.1 ("MPI Version"). So, not 1.3.2. And, I don't know how to
make sense of the stack trace... there an "mca_common_sm_mmap_init"
ftruncate problem and stuff apparently much later on. How can this be?
The Sun/Linux problems must be row #6. Yes? Again, the "MPI Version"
is labeled 1.3.1. Is that informative or misleading? Lots of stacks
looking like this is happening during MPI_Init. I try running a code
that just does MPI_Init on similar configs and seem unable to trigger
this problem.
How do I figure out the compiler used?
I need help reproducing this problem.