Ralph Castain wrote:

It looks like the SM revisions we inserted into 1.3.2 are a great detector for shared memory init failures - it segfaulted 143 times last night on IU's sif computer, 34 times on Sun/Linux, and 3 times on Sun/SunOS...almost every single time due to "Address not mapped" errors in the sm btl during init.

Might be worth someone looking at the MTT output stack traces -this is something that now appears to be reproducible, and should be addressed before we release 1.3.2 as it seems far more likely to happen than in the past.

Okay.  I look at http://www.open-mpi.org/mtt/index.php?do_redir=973

If we start with the 3 Sun/SunOS failures (row #7), these seem to be labeled 1.3.1 ("MPI Version"). So, not 1.3.2. And, I don't know how to make sense of the stack trace... there an "mca_common_sm_mmap_init" ftruncate problem and stuff apparently much later on. How can this be?

The Sun/Linux problems must be row #6. Yes? Again, the "MPI Version" is labeled 1.3.1. Is that informative or misleading? Lots of stacks looking like this is happening during MPI_Init. I try running a code that just does MPI_Init on similar configs and seem unable to trigger this problem.

How do I figure out the compiler used?

I need help reproducing this problem.

Reply via email to