Re: [OMPI devel] Latency perf: v1.6 vs. v1.7 vs. trunk

Jeff Squyres Thu, 25 Oct 2012 12:55:10 -0400

Something that might not be clear from my initial writeup:

1. I had to go change C code to disable libnbc.  Since non-blocking collectives 
are part of MPI-3:
   a) we have no convenient configure argument to not build the libnbc coll 
component (there is a way, but it's laborious), and 
   b) even if we did, OMPI's coll selection will fail at run time because it 
didn't find modules for the non-blocking collective operations.


2. Hence:
   a) performance is bad, at least partially because of libnbc
   b) there's also some other bad performance oddities in there
   c) but there's some good performance improvements, too, that would be good 
to bring to v1.7 (and v1.6, if possible)


On Oct 25, 2012, at 12:32 PM, Jeff Squyres wrote:

> Attached are the following graphs:
> 
> 1. sm NetPipe latencies up to size 150 bytes (run on a Sandy Bride, 2 procs 
> same core)
> 2. openib NetPipe latencies up to size 150 bytes (run on 2 old Xeons 
> [pre-Nehalem] with old Mellanox ConnectX IB HCAs)
> 3. Same as #1, but all the way up to 8MB
> 4. Same as #2, but all the way up to 8MB
> 
> I also attached a tarball of all my raw net pipe numbers (since the graphs 
> are loglog).
> 
> There's definite weirdness here.  Here's some observations:
> 
> a) Trunk openib latency is noticeably better in the mid-range as compared to 
> v1.6 and v1.7.  This is good!  Is this change something that can be brought 
> to v1.6 / v1.7?
> 
> b) The addition of the libnbc progress function to the progress loop has a 
> non-zero impact on latency.  It's most noticeable in graphs #1 and #2.  Can 
> something be done to only add the libnbc progress function to the loop only 
> when NBC operations are ongoing?  Right now, the libnbc progress function is 
> *always* added to the progress loop, even if you never use any NBCs.
> 
> c) There's a noticeable increase in small message latency for the openib BTL 
> in v1.7 as compared to the trunk and v1.6 branches.  I don't know if this is 
> an openib thing, or the result of something else.
> 
> d) The trunk (without libnbc) has the best small message sm latency, period 
> -- even better than v1.6.  Yay!  Is this decrease in latency (compared to 
> v1.6) something that can be brought to v1.7?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> <netpipe-sm-latencies-to-128.pdf><netpipe-openib-latencies-to-128.pdf><netpipe-sm-latencies.pdf><netpipe-openib-latencies.pdf><netpipe-latency-numbers.tar.bz2>


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Latency perf: v1.6 vs. v1.7 vs. trunk

Reply via email to