On Jul 5, 2007, at 4:16 PM, Glendenning, Lisa wrote:
Ron Brightwell at SNL has asked me to look into optimizing Open MPI's
one-sided operations over Portals. Does anyone have any guidance or
thoughts for this?
Hi Lisa -
There are currently two implementations of the one-sided interface
for Open MPI: pt2pt and rdma.
The pt2pt component is implemented entirely over the interfaces used
to implement the MPI-1 point-to-point interface. So it ends up doing
lots of copies and is entirely two-sided. It could support async
progress with threads, but that doesn't help the XT platform all that
much. It was the first one-sided component implemented, mostly
because we needed to support protocols like MX and PSM that don't
really expose one-sided semantics, and I only wanted to support one
new component per release.
The rdma component is implemented over our BTL (byte transport layer
-- the device driver our communication is written over), and can
either use call-back based send/receive or true rdma. The true rdma
is only for put/get for contiguous datatypes. The performance on
OpenIB is ok, but not great (I'll send you some more details off
list). I'd assume that the performance on Portals would be similar.
However, the btl_put and btl_get implementation for the Portals BTL
was implemented assuming it would only be used the way the PML (the
MPI-1 point-to-point implementation) used it. It won't work with the
rdma one-sided component at this time. I can go into more details if
you decide that fixing the Portals BTL to support the rdma component
is a path you want to look at.
Then, of course, there's the option of writing a Portals-specific one-
sided component. The component interface is pretty straight-forward
-- it's the MPI-2 one-sided chapter interface functions, plus an
initialization function. This is the path towards best performance,
but also means the most code to write. The existing code in Open MPI
handles the attribute management, but that's about it if you go this
route. Of course, you can always copy freely from the rdma and pt2pt
components. There used to be a document somewhere describing how to
add a new component, but I think it is horribly out of date. I'll
see if I can find it and send it your way.
Of course, the first starting point is to get a checkout of the code
and get it built. There are instructions for getting an SVN checkout
of Open MPI (and how to get it built from there) available on the web
page:
http://www.open-mpi.org/svn/
Building on the XT platform (if you're going that route) is slightly
more complicated, and you probably want to take a look at the
horribly out of date wiki page on the subject here:
https://svn.open-mpi.org/trac/ompi/wiki/CrayXT3
Hopefully, that's enough to get you started. If you have any
questions, ask away.
Brian
--
Brian W. Barrett
Networking Team, CCS-1
Los Alamos National Laboratory