On Dec 1, 2005, at 10:58 AM, Greg Watson wrote:
@#$%^& it! I can't get the problem to manifest for either branch now.
Well, that's good for me. :-)
FWIW, the problem existed on systems that could/would return different
addresses in different processes from mmap() for memory that was common
to all of them. E.g., if processes A and B share common memory Z, A
would get virtual address M for Z, and B would get virtual address N
(as opposed to both A and B getting virtual address M).
Here's the history of what happened...
We had code paths for that situation in the sm btl (i.e., when A and B
get different addresses for the same shared memory), but unbeknownst to
us, mmap() on most systems seems to return the same value in A and B
(this could be a side-effect of typical MPI testing memory usage
patterns... I don't know).
But FC3 and FC4 consistently did not seem to follow this pattern --
they would return different values from mmap() in different processes.
Unfortunately, we did not do any testing on FC3 or FC4 systems until a
few weeks before SC, and discovered that some of our
previously-unknowingly-untested sm bootstrap code paths had some bugs.
I fixed all of those and brought [almost all of] them over to the 1.0
release branch. I missed one patch in v1.0, but it will be included in
v1.0.1, to be released shortly.
So I'd be surprised if you were still seeing this bug in either branch
-- as far as I know, I fixed all the issues. More specifically, if you
see this behavior, it will probably be in *both* branches.
Let me know if you run across it again. Thanks!
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/