On Oct 26, 2010, at 1:27 PM, Ralph Castain wrote:

> I think we can do the old libevent for now as the trunk doesn't exploit the 
> new 2.0 features yet (though I have some implemented in a branch that is now 
> on hold). However, if we can fix shared memory quickly (and Sam appears to 
> have something that works, though isn't fully verified yet), and can resolve 
> the performance question quickly, I would MUCH rather not waste my time on 
> retrofitting 1.4!

Sorry I had to drop off the call today (tornados in my area!).

After digging around in the new libevent a bit, I found the problem -- it's 
exactly what I said in my first mail: libevent called poll() with an infinite 
timeout.  I talked with Brian and we're pretty sure we have the right solution. 
 I committed it in r23957.

Ralph committed a performance fix in r23956 (i.e., disable libevent's threading 
support --  we need to evaluate what this means for MPI_THREAD_MULTIPLE).  
Testing shows that this puts us back in the right performance ballpark; 
attached are 2 graphs of NetPIPE that I ran on 2 wolfdale-class machines at 
Cisco.  I ran with the trunk HEAD (after the libevent fix commits from today) 
and with a commit from before all the libevent upgrades.

*** Confirmation of this data from another site would be greatly appreciated.

In short, the graphs show:

- TCP BTL performance over gigE and IPoIB is the same (between the 2 machines)
- SM BTL performance is a skosh lower in the new libevent (on 1 machine)

Note that these were DEBUG builds -- optimized builds would be a little better 
(particularly in SM latency).  Ralph and I discussed a performance tweak that 
he's going to implement tonight.  We think/hope will put the SM 
latency/bandwidth right back where it was before the upgrade -- i.e., we think 
it'll erase the small performance difference.

-----

As such, given that everything *seems* to be working properly, and *seems* to 
be back at the old performance level, I personally don't think it's worth it to 
do a libevent component of the old version.  I had thought it would be an easy 
component to do, but apparently it's not (i.e., it would be a 2-3 days' worth 
of work -- which doesn't seem worth it to me).  I think our time would be 
better suited to tuning up the new libevent properly.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Attachment: netpipe-bandwidths.pdf
Description: Adobe PDF document

Attachment: netpipe-latencies.pdf
Description: Adobe PDF document

Reply via email to