Re: [OMPI devel] some question about OMPI communicationinfrastructure

2009-06-19 Thread Leo P.
Hi jeff,

All the information provided here helps me a lot.

Thank you, really really really appreciate it.  :)

Regards,
Leo P.





From: Jeff Squyres 
To: Open MPI Developers 
Sent: Friday, 19 June, 2009 5:05:59 AM
Subject: Re: [OMPI devel] some question about OMPI communicationinfrastructure

A few addendums in no particular order...

1. The ompi/ tree is the MPI layer.  It's the top layer in the stack.  It uses 
ORTE and OPAL for various things.

2. The PML (point-to-point messagging layer) is the stuff right behind 
MPI_SEND, MPI_RECV, and friends.  We have two main PMLs: OB1 and CM (and some 
other similar ones, but not important here).  OB1 is probably the only one you 
care about.

3. OB1 effects the majority of the MPI rules and behavior.  It makes 
MPI_Requests, processes them, potentially segments and re-assembles individual 
messages, etc.

4. OB1 uses BTLs (Byte Transfer Layers) to actually move bytes between 
processes.  Each BTL is for a different kind of transport; OB1 uses the BML 
(BTL multiplexing layer; "layer" is a generous term here; think of it as 
trivial BTL pointer array management functionality) to manage all the BTLs that 
it is currently using.

5. OB1 and some of the BTLs use the ORTE layer for "out of band" 
communications, usually for initialization and finalization.  The "OOB" ORTE 
framework is more-or-less equivalent to the BTL framework, but it's *only* used 
for ORTE-level communications (not MPI communications).  The RML (routing 
message layer) ORTE framework is a layer on top of the OOB that has the 
potential to route messages as necessary.  To be clear, the OMPI layer always 
uses the RML, not the OOB directly (the RML uses the OOB underneath).

6. A bunch of OOB connections are made during the startup of the MPI job.  BTL 
connections are generally made on an "as needed" basis (e.g., during the first 
MPI_SEND to a given peer).  Ralph will have to fill you in on the details of 
how/when/where OOB connections are made.

7. There is unfortunately little documentation on the OMPI source code except 
comments in the code.  :-\  However, there was a nice writeup recently that may 
be helpful to you:

http://www.open-mpi.org/papers/trinity-btl-2009/

8. Once TCP BTL connections are made, IP addressing is no longer necessary in 
the OMPI-level messages that are sent because the sockets are connected 
point-to-point -- i.e., the peer process is already known because we have a 
socket to them.  The MPI layer messaging more contains things like the 
communicator ID, tag, ...etc.

Hope that helps!


On Jun 18, 2009, at 10:26 AM, Ralph Castain wrote:

> Hi Leo
> 
> The MPI communications is contained in the ompi/mca/btl code area. The BTL's 
> (Bit Transport Layer) actually moves the message data. Each BTL is 
> responsible for opening its own connections - ORTE has nothing to do with it, 
> except to transport out-of-band (OOB) messages to support creating the 
> connection if that specific BTL requires it.
> 
> If you are interested in TCP communications, you will find all of that code 
> in ompi/mca/btl/tcp. It can be confusing down there, so expect to spend a 
> little time trying to understand it. I believe Jeff has some documentation on 
> the OMPI web site about it (perhaps a video?).
> 
> The source/destination is embedded in the message, again done by each BTL 
> since the receiver must be a BTL of the same type. Again, this has nothing to 
> do with ORTE - it is purely up to the BTL. MPI communications are also 
> coordinated by the PML, which is responsible for matching messages with 
> posted receives. You might need to look at the ompi/mca/pml/ob1 code to 
> understand how that works.
> 
> Hope that gives you a starting point
> Ralph
> 
> On Jun 18, 2009, at 7:57 AM, Leo P. wrote:
> 
>> Hi Everyone,
>> 
>> 
>> 
>> I wanted to ask some questions about things I am having trouble 
>> understanding.
>> 
>> •
>> As far as my understanding of MPI_INIT function, I assumed MPI_INIT 
>> typically procedure resources required including the sockets. But now as I 
>> understand from the documentation that openMPI only allocated socket when 
>> the process has to send a message to a peer. If some one could let me where 
>> exactly in the code this is happening I would appreciate a lot. I guess this 
>> is happening in ORTE layer so I am spending time looking at it. But if some 
>> one could let me in which function this is happening it will help me a lot.
>> 
>> •
>> Also I think most of the MPI implementation embed source and destination 
>> address with the communication protocol. Am I right to assume openMPI does 
>> the same thing. Is this also h

Re: [OMPI devel] some question about OMPI communicationinfrastructure

2009-06-18 Thread Jeff Squyres

A few addendums in no particular order...

1. The ompi/ tree is the MPI layer.  It's the top layer in the stack.   
It uses ORTE and OPAL for various things.


2. The PML (point-to-point messagging layer) is the stuff right behind  
MPI_SEND, MPI_RECV, and friends.  We have two main PMLs: OB1 and CM  
(and some other similar ones, but not important here).  OB1 is  
probably the only one you care about.


3. OB1 effects the majority of the MPI rules and behavior.  It makes  
MPI_Requests, processes them, potentially segments and re-assembles  
individual messages, etc.


4. OB1 uses BTLs (Byte Transfer Layers) to actually move bytes between  
processes.  Each BTL is for a different kind of transport; OB1 uses  
the BML (BTL multiplexing layer; "layer" is a generous term here;  
think of it as trivial BTL pointer array management functionality) to  
manage all the BTLs that it is currently using.


5. OB1 and some of the BTLs use the ORTE layer for "out of band"  
communications, usually for initialization and finalization.  The  
"OOB" ORTE framework is more-or-less equivalent to the BTL framework,  
but it's *only* used for ORTE-level communications (not MPI  
communications).  The RML (routing message layer) ORTE framework is a  
layer on top of the OOB that has the potential to route messages as  
necessary.  To be clear, the OMPI layer always uses the RML, not the  
OOB directly (the RML uses the OOB underneath).


6. A bunch of OOB connections are made during the startup of the MPI  
job.  BTL connections are generally made on an "as needed" basis  
(e.g., during the first MPI_SEND to a given peer).  Ralph will have to  
fill you in on the details of how/when/where OOB connections are made.


7. There is unfortunately little documentation on the OMPI source code  
except comments in the code.  :-\  However, there was a nice writeup  
recently that may be helpful to you:


http://www.open-mpi.org/papers/trinity-btl-2009/

8. Once TCP BTL connections are made, IP addressing is no longer  
necessary in the OMPI-level messages that are sent because the sockets  
are connected point-to-point -- i.e., the peer process is already  
known because we have a socket to them.  The MPI layer messaging more  
contains things like the communicator ID, tag, ...etc.


Hope that helps!


On Jun 18, 2009, at 10:26 AM, Ralph Castain wrote:


Hi Leo

The MPI communications is contained in the ompi/mca/btl code area.  
The BTL's (Bit Transport Layer) actually moves the message data.  
Each BTL is responsible for opening its own connections - ORTE has  
nothing to do with it, except to transport out-of-band (OOB)  
messages to support creating the connection if that specific BTL  
requires it.


If you are interested in TCP communications, you will find all of  
that code in ompi/mca/btl/tcp. It can be confusing down there, so  
expect to spend a little time trying to understand it. I believe  
Jeff has some documentation on the OMPI web site about it (perhaps a  
video?).


The source/destination is embedded in the message, again done by  
each BTL since the receiver must be a BTL of the same type. Again,  
this has nothing to do with ORTE - it is purely up to the BTL. MPI  
communications are also coordinated by the PML, which is responsible  
for matching messages with posted receives. You might need to look  
at the ompi/mca/pml/ob1 code to understand how that works.


Hope that gives you a starting point
Ralph

On Jun 18, 2009, at 7:57 AM, Leo P. wrote:


Hi Everyone,



I wanted to ask some questions about things I am having trouble  
understanding.


•
As far as my understanding of MPI_INIT function, I assumed MPI_INIT  
typically procedure resources required including the sockets. But  
now as I understand from the documentation that openMPI only  
allocated socket when the process has to send a message to a peer.  
If some one could let me where exactly in the code this is  
happening I would appreciate a lot. I guess this is happening in  
ORTE layer so I am spending time looking at it. But if some one  
could let me in which function this is happening it will help me a  
lot.


•
Also I think most of the MPI implementation embed source and  
destination address with the communication protocol. Am I right to  
assume openMPI does the same thing. Is this also happening in the  
ORTE layer.


Is there a documentation about this openMPI site? if there can  
someone please let me know the location of it.




Sincerely,

Leo.P


ICC World Twenty20 England '09 exclusively on YAHOO!  
CRICKET___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems