On Oct 26, 2010, at 1:27 PM, Ralph Castain wrote: > I think we can do the old libevent for now as the trunk doesn't exploit the > new 2.0 features yet (though I have some implemented in a branch that is now > on hold). However, if we can fix shared memory quickly (and Sam appears to > have something that works, though isn't fully verified yet), and can resolve > the performance question quickly, I would MUCH rather not waste my time on > retrofitting 1.4!
Sorry I had to drop off the call today (tornados in my area!). After digging around in the new libevent a bit, I found the problem -- it's exactly what I said in my first mail: libevent called poll() with an infinite timeout. I talked with Brian and we're pretty sure we have the right solution. I committed it in r23957. Ralph committed a performance fix in r23956 (i.e., disable libevent's threading support -- we need to evaluate what this means for MPI_THREAD_MULTIPLE). Testing shows that this puts us back in the right performance ballpark; attached are 2 graphs of NetPIPE that I ran on 2 wolfdale-class machines at Cisco. I ran with the trunk HEAD (after the libevent fix commits from today) and with a commit from before all the libevent upgrades. *** Confirmation of this data from another site would be greatly appreciated. In short, the graphs show: - TCP BTL performance over gigE and IPoIB is the same (between the 2 machines) - SM BTL performance is a skosh lower in the new libevent (on 1 machine) Note that these were DEBUG builds -- optimized builds would be a little better (particularly in SM latency). Ralph and I discussed a performance tweak that he's going to implement tonight. We think/hope will put the SM latency/bandwidth right back where it was before the upgrade -- i.e., we think it'll erase the small performance difference. ----- As such, given that everything *seems* to be working properly, and *seems* to be back at the old performance level, I personally don't think it's worth it to do a libevent component of the old version. I had thought it would be an easy component to do, but apparently it's not (i.e., it would be a 2-3 days' worth of work -- which doesn't seem worth it to me). I think our time would be better suited to tuning up the new libevent properly. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
netpipe-bandwidths.pdf
Description: Adobe PDF document
netpipe-latencies.pdf
Description: Adobe PDF document