Re: [OMPI devel] RFC: sm Latency

Patrick Geoffray Wed, 21 Jan 2009 18:48:06 -0500

Eugene Loh wrote:

Possibly, you meant to ask how one does directed polling with a wildcardsource MPI_ANY_SOURCE. If that was your question, the answer is wepunt. We report failure to the ULP, which reverts to the standard codepath.

Sorry, I meant ANY_SOURCE. If you poll only the queue that correspond toa posted receive, you only optimize micro-benchmarks, until they startusing ANY_SOURCE. So, does recvi() is a one-time shot ? Ie do you pollthe right queue only once and if it fails then you fall back on pollingall queues ? If yes, then it's unobtrusive but I don't think it wouldhelp much. If you poll the right queue many times, then you have todecide when to fall back on polling all queues, and it's not trivial.

How do you ensure you check all incoming queues from time to time to prevent 
flow control (specially if the queues are small for scaling) ?
There are a variety of choices here. Further, I'm afraid we ultimatelyhave to expose some of those choices to the user (MCA parameters orsomething).

In the vast majority of cases, users don't know how to turn the knobs.The problem is that with local np going up, queue sizes will go downfast (square root), and you will have to poll all queues more often.Using more memory for queues just pushed the scalability wall a littlebit further.

congestion. What if then the user code posts a rather specific request(receive a message with a particular tag on a particular communicatorfrom a particular source) and with high urgency (blocking request... "Iain't going anywhere until you give me what I'm asking for"). A goodservant would drop whatever else s/he is doing to oblige the boss.

If you poll only one queue, then stuff can pile up on another and asender is now blocked. At best, you have a synchronization point. Atworst, a deadlock.

So, let's say there's a standard MPI_Recv. Let's say there's also somecongestion starting to build. What should the MPI implementation do?

The MPI implementation cannot trust the user/app to indicates where themessages will come from. So, if you have N incoming queues, you need topoll them all eventually. If you do, polling time increase linearly. Ifyou try to limit the polling space with whatever heuristic (like thequeue corresponding to the current blocking receive), then you take therisk of not consuming fast enough another queue. And usually, theheuristics quickly fall apart (ANY_SOURCE, multiple asynchronousreceives, etc).


Really, only single-queue solves that.

Yes, and you could toss the receive-side optimizations as well. So, onecould say, "Our np=2 latency remains 2x slower than Scali's, but atleast we no longer have that hideous scaling with large np." Maybethat's where we want to end up.

I think all optimizations except recvi() are fine and worth using. I amjust saying that the recvi() optimization is dubious as it is, and thesingle-queue is potentially a larger hanging fruit on the recv side: itcould still be fast (spinlock or atomic to manage shared receive queue)to have lower np=2 latency, and it would scale well with large np. Notuning needed, no special cases, smaller memory footprint.


I will leave it at that, just some inputs.

Patrick

Re: [OMPI devel] RFC: sm Latency

Reply via email to