Re: [OMPI devel] trac ticket 1944 and pending sends

Brian W. Barrett Tue, 23 Jun 2009 12:57:55 -0400

I think that sounds like a rational path forward. Another, more longterm, option would be to move from the FIFOs to a linked list (which caneven be atomic), which is what MPICH does with nemesis. In that case,there's never a queue to get backed up (although the receive queue forcollectives is still a problem). It would also solve the returning afragment without space problem, as there's always space in a linked list.


Brian


On Tue, 23 Jun 2009, Eugene Loh wrote:

The sm BTL used to have two mechanisms for dealing with congested FIFOs. Onewas to grow the FIFOs. Another was to queue pending sends locally (on thesender's side). I think the grow-FIFO mechanism was typically invoked andthe pending-send mechanism used only under extreme circumstances (no morememory).
With the sm makeover of 1.3.2, we dropped the ability to grow FIFOs. Thecode added complexity and there seemed to be no need to have two mechanismsto deal with congested FIFOs. In ticket 1944, however, we see that repeatedcollectives can produce hangs, and this seems to be due to the pending-sendcode not adequately dealing with congested FIFOs.
Today, when a process tries to write to a remote FIFO and fails, it queuesthe write as a pending send. The only condition under which it retriespending sends is when it gets a fragment back from a remote process.
I think the logic must have been that the FIFO got congested because weissued too many sends. Getting a fragment back indicates that the remoteprocess has made progress digesting those sends. In ticket 1944, we see thata FIFO can also get congested from too many returning fragments. Further,with shared FIFOs, a FIFO could become congested due to the activity of athird-party process.
In sum, getting a fragment back from a remote process is a poor indicatorthat it's time to retry pending sends.
Maybe the real way to know when to retry pending sends is just to check ifthere's room on the FIFO.
So, I'll try modifying MCA_BTL_SM_FIFO_WRITE. It'll start by checking ifthere are pending sends. If so, it'll retry them before performing therequested write. This should also help preserve ordering a little better.I'm guessing this will not hurt our message latency in any meaningful way,but I'll check this out.
Meanwhile, I wanted to check in with y'all for any guidance you might have.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] trac ticket 1944 and pending sends

Reply via email to