On Dec 1, 2005, at 10:58 AM, Greg Watson wrote:

@#$%^& it! I can't get the problem to manifest for either branch now.

Well, that's good for me.  :-)

FWIW, the problem existed on systems that could/would return different addresses in different processes from mmap() for memory that was common to all of them. E.g., if processes A and B share common memory Z, A would get virtual address M for Z, and B would get virtual address N (as opposed to both A and B getting virtual address M).

Here's the history of what happened...

We had code paths for that situation in the sm btl (i.e., when A and B get different addresses for the same shared memory), but unbeknownst to us, mmap() on most systems seems to return the same value in A and B (this could be a side-effect of typical MPI testing memory usage patterns... I don't know).

But FC3 and FC4 consistently did not seem to follow this pattern -- they would return different values from mmap() in different processes. Unfortunately, we did not do any testing on FC3 or FC4 systems until a few weeks before SC, and discovered that some of our previously-unknowingly-untested sm bootstrap code paths had some bugs. I fixed all of those and brought [almost all of] them over to the 1.0 release branch. I missed one patch in v1.0, but it will be included in v1.0.1, to be released shortly.

So I'd be surprised if you were still seeing this bug in either branch -- as far as I know, I fixed all the issues. More specifically, if you see this behavior, it will probably be in *both* branches.

Let me know if you run across it again.  Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to