Hmmm....comparing the "new" code with the "old" one, I see some thread locking 
in the "old" code that didn't make it across. Is it possible this could be 
affecting shared memory?

There were other hand edits in the event code - hard to tell sometimes what was 
put in by us vs already there, and with all the libevent threading code now in 
place, it isn't entirely clear what they may have already fixed. Also, note 
that their thread safety may well be "on", which could account for some of the 
performance change. We may need to consider that as it isn't controllable as a 
function of event base (it is a compile-time option).

Might be worth having the folks who did the prior hand edits review the code to 
see what changed, what may need to be edited again. If possible, it would 
really help to avoid hand editing their code this time so we don't have to keep 
reviewing it.


On Oct 25, 2010, at 7:29 PM, George Bosilca wrote:

> 
> On Oct 25, 2010, at 20:22 , Jeff Squyres wrote:
> 
>> I dug into this a bit.  
>> 
>> The problem is in the SM BTL init where it's waiting for all of the peers to 
>> set seg_inited in shared memory (so that it knows everyone has hit that 
>> point).  We loop on calling opal_progress while waiting.
>> 
>> The problem is that opal_progress() is not returning (!).
>> 
>> It appears that libevent's poll_dispatch() function is somehow getting an 
>> infinite timeout -- it *looks* like libevent is determining that there are 
>> no timers active, so it decides to set an infinite timeout (i.e., block) 
>> when it calls poll().  Specifically, event.c:1524 calls timeout_next(), 
>> which sees that there are no timer events active and resets tv_p to NULL.  
>> We then call the underlying fd-checking backend with an infinite timeout.  
>> 
>> Bonk.
>> 
>> Anyone more familiar with libevent's internals know why this is happening / 
>> if this is a change since the old version?
> 
> I did some digging too. My conclusion is somehow [slightly] different.
> 
> 1. Not all processes deadlock in btl_sm_add_procs. The process that setup the 
> shared memory area, is going forward, and block later in a barrier. Why do we 
> have a barrier in MPI_Init, it's another question, but is not related to the 
> problem at hand here.
> 
> 2. All other processes, loop around the opal_progress, until they got a 
> message from all other processes. The variable used for counting is somehow 
> updated correctly, but we still call opal_progress. I couldn't figure out is 
> we loop more that we should, or if opal_progress doesn't return. However, 
> both of these possibilities look very unlikely to me: the loop in the 
> sm_add_procs is pretty straightforward, and I couldn't find any loops in 
> opal_progress. I wonder if some of the messages get lost on the exchange.
> 
> 3. If I unblock the situation by hand, everything goes back to normal. 
> NetPIPE runs to completion but the performances are __really__ bad. On my 
> test machine I get around 2000Mbs, when the expected value is at least 10 
> times more. Similar finding on the latency side, we're now at 1.65 micro-sec 
> up from the usual 0.35 we had before.
> 
>  george.
> 
> 
>> On Oct 25, 2010, at 6:07 PM, Jeff Squyres wrote:
>> 
>>> On Oct 25, 2010, at 3:21 PM, George Bosilca wrote:
>>> 
>>>> So now we're in good shape, at least for compiling. IB and TCP seem to 
>>>> work, but SM deadlock.
>>> 
>>> Ugh.
>>> 
>>> Are you debugging this, or are we? (i.e., me/Ralph)
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to