Re: [OMPI devel] Modex and others

Jeff Squyres Fri, 7 Nov 2008 16:16:23 -0500

On Nov 7, 2008, at 10:18 AM, Leonardo Fialho wrote:

I understand that a process need to have the contact information tosend MPI messages to other processes, and modex permits it. Myquestion is, why do not perform the contact exchange when it isnecessary?
For example: in a M/W application, the workers does not need moreinformation than the masters contact info.
I think that it reduces the startup time, but increases the *first*communication between two peers.

FWIW, this is actually a pretty complex topic. There are many, manytradeoffs in terms of what performance do you want vs. whatfunctionality do you want. This subject has been discussed for many,many hours by the OMPI developers. :-)

The modex is performed during MPI_INIT; the v1.3 series' modex isquite a bit more efficient than the v1.2 series' modex. The modexinformation comprises of several things, some of which are either thecontact info or "reachability" info of BTL modules. For the openibBTL, for example, port subnet ID's and MTU's are passed in the modex,but LIDs don't need to be passed (in some cases) until two processesactually try to reach each other. We use the reachability informationto determine whether a given BTL module *could* be used to connect toa remote peer. For example, if we get to the end of MPI_INIT and finda peer that cannot be reached, we abort (after hours of debate, wedecided it was better to abort right away when there was a peer thatcould not be reached rather than abort only on attempted first contactbecause it could be a simple network/configuration error that shouldbe detected immediately, rather than erroring out [potentially] longinto a multi-hour run).

We have been discussing a "modex-less" startup for quite a while; thisis actually one of the topics on the agenda for an engineering meetingthat we're having December. modex-less is quite important forscalability to many thousands of processes, but other tradeoffs may benecessary to make this work (read: we've talked about modex-less forforever; we're finally likely to do it in the near future because ofsome upcoming very very large scale machines at US DOE labs).


Does that make sense?

--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] Modex and others

Reply via email to