hey Jeff/Galen,

Thanks to both of you for helping answer our questions, both on and off the list. Currently, we're doing a lot of writing trying to focus on MPI implementation design strategies, so this has helped us certainly; hopefully others too.

On our end, generally, we've been trying to push as much functionality
down to the transport (we have some info on our webpage :
http://www.cs.ubc.ca/labs/dsg/mpi-sctp/ or you can hear me talk at SC|05 )
where as your approach is to bring functionality up and manage it within
the middleware (obviously you do a lot of other neat things like thread
safety and countless other things that are really impressive).  With
respect to managing interfaces in the middleware, I understand it buys you
some generality though since channel bonding (for TCP) and concurrent
multipath transfer (for SCTP) aren't available for mVAPI, Open IB, GM, MX,
etc.

Already, I think it's cool to read about OpenMPI's design; in the future,
it will be cooler to hear if pulling so much functionality up to the
middleware has any performance drawbacks from having to do so much
management (so comparing for example, a setup with two NICs using OpenMPI
striping to that of a thinner middleware that has the same setup but uses
channel bonding).  From the looks of it, your Euro PVM/MPI paper is going
to tell about the low cost of software components; I'm just curious of the
costs of even having this management functionality in the middleware in
the first place; time will tell!

Thanks again for all your answers,

brad


On Wed, 31 Aug 2005, Galen M. Shipman wrote:


On Aug 31, 2005, at 1:06 PM, Jeff Squyres wrote:

On Aug 29, 2005, at 9:17 PM, Brad Penoff wrote:


PML: Pretty much the same as it was described in the paper.  Its
interface is basically MPI semantics (i.e., it sits right under
MPI_SEND and the rest).

BTL: Byte Transfer Layer; it's the next generation of PTL.  The
BTL is
much more simple than the PTL, and removes all vestigaes of any MPI
semantics that still lived in the PTL.  It's a very simple byte
mover
layer, intended to make it quite easy to implement new network
interfaces.


I was curious about what you meant by the removal of MPI
semantics.  Do
you mean it simply has no notion of tags, ranks, etc?  In other
words,
does it simply put the data into some sort of format so that the
PML can
operate on with its own state machine?


I don't recall the details (it's been quite a while since I looked at
the PTL), but there was some semblance of MPI semantics that creeped
down into the PTL interface itself.  The BTL has none of that -- it's
purely a byte mover.


The old ptl's controlled the short vs long rendezvous protocol, the
eager transmission of data, as well as pipelining of rdma operations
(where appropriate). In the pml OB1 and the btls this has all been
moved the OB1 level. Note that this is simply a logical separation of
control and comes at virtually no cost (well there is the very small
cost of using a function pointer).



Also, say you had some underlying protocol that allowed unordered
delivery
of data (so not fully ordered like TCP); which "layer" would the
notion of
"order" be handled in?  I'm guessing PML would need some sort of
sequence
number attached to it; is that right?


Correct.  That was in the PML in the 2nd gen stuff and is still at
the PML in the 3rd gen stuff.


BML: BTL Management Layer; this used to be part of the PML but we
recently split it off into its own framework.  It's mainly the
utility
gorp of managing multiple BTL modules in a single process.  This was
done because when working with the next generation of collectives,
MPI-2 IO, and MPI-2 one sided operations, we want to have the
ability
to use the PML (which the collectives do today, for example) or
to be
able to dive right down and directly use the BTLs (i.e., cut out a
little latency).


In the cases where the BML is required, does it cost extra memcpy's?


Not to my knowledge.  Galen -- can you fill in the details of this
question and the rest of Brad's questions?

The BML layer is simply a management layer for discovering peer
resources. It does mask the btl send, put, prepare_src, prepare_dst
operations but this code is all inlined and very short so gcc should
inline this appropriately. In fact this inlined code used to be in
the PML OB1 before we added the BML so it is a no cost "logical"
abstraction.  We don't add any extra memory copies in this abstraction.

Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to