Re: [OMPI devel] sm BTL flow management

Eugene Loh Thu, 25 Jun 2009 03:06:17 -0400

Bryan Lally wrote:

Ralph Castain wrote:

Be happy to put it through the wringer... :-)


My wringer is available, too.


'kay.  Try

hg clone ssh://www.open-mpi.org/~eloh/hg/pending_sends

which is r21498 but with changes to poll one's own FIFO more regularly(e.g., even when just performing sends) and to retry pending sends moreaggressively (e.g., whenever about to try a send or whenever one callssm progress). I maintain a count of outstanding fragments (sent but notyet returned to free list) and of pending sends (total over all queues)to keep overheads down.

My various test codes (repeated Bcasts, half-duplex point-to-pointsends, etc.) all pass now. There is no perceptible degradation in0-byte pingpong latency that I can tell. George's fixed-free-listproposal may be better, but I'm making these bits available for somesoak and feedback.

Life is still not perfect. If you look inmca_btl_sm_component_progress, when a process receives a messagefragment and returns it to the sender, it executes code like this:


    goto recheck_peer;
    break;

Okay, the reason I show you that code is because a static code checkershould easily identify the break statement as dead code. It'll never bereached. Anyhow, in English, what's happening is if you receive amessage fragment, you keep polling your FIFO. So, consider the case ofhalf-duplex point-to-point traffic: one process only sends and theother process only receives. Previously, this would eventually hang.Now, it won't. But (I haven't confirmed 100% yet), I don't think itexecutes very pleasantly. E.g., if you have


    for ( i = 0; i < N; i++ ) {
         if ( me == 0 ) MPI_Send(...);
         if ( me == 1 ) MPI_Recv(...);
    }

At some point, the receiver falls hopelessly behind. The sender keepspumping messages and the receiver keeps polling its FIFO, pulling inmessages and returning fragments to the sender so that the sender cankeep on going. Problem is, all that is happening within one MPI_Recvcall... which in a test code might be pulling in 100Ks of messages. TheMPI_Recv call won't return until the sender lets up. Then, the rest ofthe MPI_Recv calls will execute, all pulling messages out of the localunexpected-message queue.

Not sure yet how I want to manage this. The bottom line might be thatif the MPI application has no flow control, the underlying MPIimplementation is going to have to do something that won't make everyonehappy. Oh well. At least the program makes progress and completes inreason time.

Re: [OMPI devel] sm BTL flow management

Reply via email to