Don, Galen, and I talked about this in depth on the phone today and
think that it is a symptom of the same issue discussed in this thread:
http://www.open-mpi.org/community/lists/devel/2007/10/2382.php
Note my message in that thread from just a few minutes ago:
http://www.open-mpi.org/community/lists/devel/2007/11/2561.php
We think that the proposed solution to that thread will also fix the
mpi_preconnect_all issues (i.e., the ping-pong that Don proposes in
his mail should not be necessary).
On Oct 17, 2007, at 10:54 AM, Don Kerr wrote:
All,
I have noticed an issue in the 1.2 branch when mpi_preconnect_all=1.
The
one way communication pattern (ranks either send or receive from each
other) may not fully establish connection with peers. Example, if I
have
a 3 process mpi job and rank 0 does not do any mpi communication after
MPI_Init() the other ranks attempts to connect will not be
progressed (I
have seen this with tcp and udapl).
The preconnect pattern has changed slightly in the trunk but
essentially
it is still a one way communication, either send or receive with each
rank. So although the issue I see in the 1.2 branch does not appear in
the trunk I wonder if this will show up again.
An alternative to the preconnect pattern that comes to mind would be
to
perform a send and receive between all ranks to ensure that
connections
have been fully established.
Does anyone have thoughts or comments on this, or reasons not to have
all ranks send and receive from all?
-DON
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems