Re: [OMPI devel] sm BTL flow management

Ralph Castain Thu, 25 Jun 2009 07:58:24 -0400

Unfortunately, we cannot access this - permissions are denied. Inpoking around, I found that your hg directory has permission 700.


Afraid you'll have to grant us permission to access this. :-/


Ralph

On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote:

Bryan Lally wrote:
Ralph Castain wrote:
Be happy to put it through the wringer... :-)
My wringer is available, too.
'kay.  Try

hg clone ssh://www.open-mpi.org/~eloh/hg/pending_sends
which is r21498 but with changes to poll one's own FIFO moreregularly (e.g., even when just performing sends) and to retrypending sends more aggressively (e.g., whenever about to try a sendor whenever one calls sm progress). I maintain a count ofoutstanding fragments (sent but not yet returned to free list) andof pending sends (total over all queues) to keep overheads down.
My various test codes (repeated Bcasts, half-duplex point-to-pointsends, etc.) all pass now. There is no perceptible degradation in 0-byte pingpong latency that I can tell. George's fixed-free-listproposal may be better, but I'm making these bits available for somesoak and feedback.
Life is still not perfect. If you look inmca_btl_sm_component_progress, when a process receives a messagefragment and returns it to the sender, it executes code like this:
   goto recheck_peer;
   break;
Okay, the reason I show you that code is because a static codechecker should easily identify the break statement as dead code.It'll never be reached. Anyhow, in English, what's happening is ifyou receive a message fragment, you keep polling your FIFO. So,consider the case of half-duplex point-to-point traffic: oneprocess only sends and the other process only receives. Previously,this would eventually hang. Now, it won't. But (I haven'tconfirmed 100% yet), I don't think it executes very pleasantly.E.g., if you have
   for ( i = 0; i < N; i++ ) {
        if ( me == 0 ) MPI_Send(...);
        if ( me == 1 ) MPI_Recv(...);
   }
At some point, the receiver falls hopelessly behind. The senderkeeps pumping messages and the receiver keeps polling its FIFO,pulling in messages and returning fragments to the sender so thatthe sender can keep on going. Problem is, all that is happeningwithin one MPI_Recv call... which in a test code might be pulling in100Ks of messages. The MPI_Recv call won't return until the senderlets up. Then, the rest of the MPI_Recv calls will execute, allpulling messages out of the local unexpected-message queue.
Not sure yet how I want to manage this. The bottom line might bethat if the MPI application has no flow control, the underlying MPIimplementation is going to have to do something that won't makeeveryone happy. Oh well. At least the program makes progress andcompletes in reason time.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] sm BTL flow management

Reply via email to