Unfortunately, we cannot access this - permissions are denied. In
poking around, I found that your hg directory has permission 700.
Afraid you'll have to grant us permission to access this. :-/
Ralph
On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote:
Bryan Lally wrote:
Ralph Castain wrote:
Be happy to put it through the wringer... :-)
My wringer is available, too.
'kay. Try
hg clone ssh://www.open-mpi.org/~eloh/hg/pending_sends
which is r21498 but with changes to poll one's own FIFO more
regularly (e.g., even when just performing sends) and to retry
pending sends more aggressively (e.g., whenever about to try a send
or whenever one calls sm progress). I maintain a count of
outstanding fragments (sent but not yet returned to free list) and
of pending sends (total over all queues) to keep overheads down.
My various test codes (repeated Bcasts, half-duplex point-to-point
sends, etc.) all pass now. There is no perceptible degradation in 0-
byte pingpong latency that I can tell. George's fixed-free-list
proposal may be better, but I'm making these bits available for some
soak and feedback.
Life is still not perfect. If you look in
mca_btl_sm_component_progress, when a process receives a message
fragment and returns it to the sender, it executes code like this:
goto recheck_peer;
break;
Okay, the reason I show you that code is because a static code
checker should easily identify the break statement as dead code.
It'll never be reached. Anyhow, in English, what's happening is if
you receive a message fragment, you keep polling your FIFO. So,
consider the case of half-duplex point-to-point traffic: one
process only sends and the other process only receives. Previously,
this would eventually hang. Now, it won't. But (I haven't
confirmed 100% yet), I don't think it executes very pleasantly.
E.g., if you have
for ( i = 0; i < N; i++ ) {
if ( me == 0 ) MPI_Send(...);
if ( me == 1 ) MPI_Recv(...);
}
At some point, the receiver falls hopelessly behind. The sender
keeps pumping messages and the receiver keeps polling its FIFO,
pulling in messages and returning fragments to the sender so that
the sender can keep on going. Problem is, all that is happening
within one MPI_Recv call... which in a test code might be pulling in
100Ks of messages. The MPI_Recv call won't return until the sender
lets up. Then, the rest of the MPI_Recv calls will execute, all
pulling messages out of the local unexpected-message queue.
Not sure yet how I want to manage this. The bottom line might be
that if the MPI application has no flow control, the underlying MPI
implementation is going to have to do something that won't make
everyone happy. Oh well. At least the program makes progress and
completes in reason time.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel