There has been a lot of discussion about IPv6 in Open MPI and OpenRTE recently. My comments here relate solely to OpenRTE and are intended to help provide some clarity to the discussion.

OpenRTE communications are done via the Runtime Messaging Library (RML) API. The RML is really a strategy layer - it determines which transport will be used for the given message, and handles routing where required (e.g., between cells). In our component architecture, the RML is implemented as a framework - only one RML component can be selected and active in a process.

Sitting under the RML is one or more transport systems - these are known as Out-Of-Band (OOB) components and reside in the oob framework. Because the OpenRTE messaging system must work in a heterogeneous environment, multiple OOB components can be selected and active at one time. The RML is responsible for picking the correct OOB to use to communicate to a specific process in the most efficient manner possible.

Message destinations are specified in terms of OpenRTE process names - *not* IP addresses. Thus, a message is sent to a particular OpenRTE process name - it is the shared responsibility of the RML and its underlying OOB components to translate that into a network address. The exact role of the RML versus the OOB in that translation process has not yet been determined.

Communication contact information for each process is provided to a process during startup in the form of URI's that contain the OpenRTE process name, IP address, and socket. A process is first given the URI for the head node process (HNP) of that cell. This is done so that the process can obtain subsequent information from the registry such as contact info for all other processes in the job, MPI-layer contact information, etc. The URI for each process clearly indicates whether IPv6 or IPv4 is to be used for contacting that process name. The system allows for multiple URI's to be provided for the same process name - selection of which one to use for a given message is done by the RML based on (a) interface availability (e.g., if only IPv4 is available, then that is the one used) and (b) network congestion. Hence, there is no ambiguity over which transport to use.

In the case of IPv6 versus IPv4, the expectation was that there would be two OOB components, one each for these two protocols. The OOB components are selected based on local support - i.e., if the local system supports IPv6, then that component would be selected and available. Likewise, if the local system can support IPv4, that component would be selected too.

I hope that helps clarify OpenRTE's operation. I truly believe that including IPv6 and IPv4 components in the OOB will be fairly simple to accomplish. Yes, there may be some duplicate code - if there is enough duplication, we can move the duplicate code into the OOB's base and let the two components share it. Otherwise, a little duplication isn't that big a deal.

I'd be happy to answer further questions. I believe you will find that the Open MPI transport layer operates in a very similar manner, though I leave that to Tim and Galen to clarify.

Ralph

Reply via email to