On Oct 25, 2010, at 20:22 , Jeff Squyres wrote:

> I dug into this a bit.  
> 
> The problem is in the SM BTL init where it's waiting for all of the peers to 
> set seg_inited in shared memory (so that it knows everyone has hit that 
> point).  We loop on calling opal_progress while waiting.
> 
> The problem is that opal_progress() is not returning (!).
> 
> It appears that libevent's poll_dispatch() function is somehow getting an 
> infinite timeout -- it *looks* like libevent is determining that there are no 
> timers active, so it decides to set an infinite timeout (i.e., block) when it 
> calls poll().  Specifically, event.c:1524 calls timeout_next(), which sees 
> that there are no timer events active and resets tv_p to NULL.  We then call 
> the underlying fd-checking backend with an infinite timeout.  
> 
> Bonk.
> 
> Anyone more familiar with libevent's internals know why this is happening / 
> if this is a change since the old version?

I did some digging too. My conclusion is somehow [slightly] different.

1. Not all processes deadlock in btl_sm_add_procs. The process that setup the 
shared memory area, is going forward, and block later in a barrier. Why do we 
have a barrier in MPI_Init, it's another question, but is not related to the 
problem at hand here.

2. All other processes, loop around the opal_progress, until they got a message 
from all other processes. The variable used for counting is somehow updated 
correctly, but we still call opal_progress. I couldn't figure out is we loop 
more that we should, or if opal_progress doesn't return. However, both of these 
possibilities look very unlikely to me: the loop in the sm_add_procs is pretty 
straightforward, and I couldn't find any loops in opal_progress. I wonder if 
some of the messages get lost on the exchange.

3. If I unblock the situation by hand, everything goes back to normal. NetPIPE 
runs to completion but the performances are __really__ bad. On my test machine 
I get around 2000Mbs, when the expected value is at least 10 times more. 
Similar finding on the latency side, we're now at 1.65 micro-sec up from the 
usual 0.35 we had before.

  george.


> On Oct 25, 2010, at 6:07 PM, Jeff Squyres wrote:
> 
>> On Oct 25, 2010, at 3:21 PM, George Bosilca wrote:
>> 
>>> So now we're in good shape, at least for compiling. IB and TCP seem to 
>>> work, but SM deadlock.
>> 
>> Ugh.
>> 
>> Are you debugging this, or are we? (i.e., me/Ralph)
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to