Re: [OMPI devel] using opal_event's versus btl_progress?

Jeff Squyres Mon, 22 Oct 2007 20:19:42 -0400

On Oct 22, 2007, at 3:06 PM, Brad Penoff wrote:

We had some questions about the best way to make use of Open MPI's
features for a new BTL...  the general theme is making use of the
opal_event's versus a btl_progress function.  When is it best to do
one versus the other?

In our Paris engineering meeting, we had a lengthy discussion about arelated topic. The end result of our conversation will result in afew things:

- We'll be updating libevent in the not-distant future (see previousmail today about that)- After updating libevent, we'll be updating to use the more modernepoll (and friends) interfaces. They're manually disabled [with goodreason] in our libevent for reasons that are too boring to describe(but I can if you care).- BTLs with a device under them are free to use libevent for fd-basedprogress and/or a progress function. Software layers withoutunderlying devices should not use progress functions.- We'll eventually be adding a blocking interface to the BTLs. Moreinfo TBD on that.

We are working on several designs for an SCTP BTL for Open MPI.  The
familiar one is to use "TCP-style" one-to-one sockets, which have a
socket per endpoint pair, just like the TCP BTL does now.  However, a
more unfamiliar one is to use a single "UDP-style" one-to-many socket
per BTL.  To illustrate, pretend you have 3 processes... each process
only has one socket upon which connections are established, messages
are sent, and messages are received to/from the other two processes.
It is this design that currently we have some questions about....

So far, we have not been implementing our own btl_progress function.
This means that within opal_progress(), poll() is called based on the
opal events registered within the BTL.  Like TCP, for example, when an
MPI_Send happens, the endpoint_send_event is added and POLLOUT is
added for this socket for a given endpoint.  Since MPI_Send is
blocking, it doesn't really matter that this socket is used for other
btl_endpoints because it is the only endpoint with an opal event for
sending added.  However, this is not the case with non-blocking...

When we have multiple outstanding non-blocking requests to different
endpoints, we have to queue them since the endpoints share the same
one-to-many socket and events are associated with a single
btl_endpoint.

From proc C, say we have this pseudo code running:

iSend(proc A)
iSend(proc B)
Waitall()

Within Waitall, our current design using opal events has the iSend to
proc A eventually complete but prior to this, the iSend to proc B
can't start until proc A's is done.  We currently queue the endpoints
waiting for the poll() POLLOUT event and dequeue from this queue when
the event from proc A's endpoint is deleted (and add proc B's endpoint
to the POLLOUT event).

Can you think of a way using the existing framework to eliminate the
restriction of the send to proc B having to complete prior to the send
to proc B starting?


I assume you meant "send to proc *A* having to complete..."

We were trying to use the existing framework but for our case, itmay make more sense to implement our own btl_progress functionsince poll() doesn't really make sense for a single socketanyway... Do you think that would be best?

I guess I don't quite understand -- are you saying that you can have2 concurrent writes occurring on the same socket to 2 differentdestinations?

If so, and if libevent doesn't match the SCTP paradigm, then I say:sure, write your own progress function.


George: can you confirm / deny?

We noticed that mca_bml_r2_progress calls btl_progress[i]() which is
set in mca_bml_r2_add_procs if NULL !=
btl->btl_component->btl_progress.  Is there an example of a btl that
implements its own btl_progress function?  I just want to make sure
this is even a possibility before traveling down this path...  and
maybe learn from others prior.


The openib btl has its own progress function.

--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] using opal_event's versus btl_progress?

Reply via email to