Hi Sreenidhi Only partial resolution. By pushing out the eager path to 4 MB we were able to get around 2GB/sec per socket connection with osu bw test.
The kernel is quite old though - 2.6.x - and being a summer student project with a focus on IB vs rout able ROCE we've moved on. Howard ________________________________ From: devel <devel-boun...@open-mpi.org> on behalf of Sreenidhi Bharathkar Ramesh <sreenidhi-bharathkar.ram...@broadcom.com> Sent: Tuesday, July 26, 2016 4:11:06 AM To: Open MPI Developers Subject: Re: [OMPI devel] tcp btl rendezvous performance question hi Howard, Was this issue resolved ? If so, what is the solution ? Please let me know. Curious to know , since we are also experimenting with these limits. Thanks, - Sreenidhi. On Tue, Jul 19, 2016 at 10:50 AM, Gilles Gouaillardet <gil...@rist.or.jp<mailto:gil...@rist.or.jp>> wrote: Howard, did you bump both btl_tcp_rndv_eager_limit and btl_tcp_eager_limit ? you might also need to bump btl_tcp_sndbuf, btl_tcp_rcvbuf and btl_tcp_max_send_size to get the max performance out of your 100Gb ethernet cards last but not least, you might also need to bump btl_tcp_links to saturate your network (that is likely a good thing when running 1 task per node, but that can lead to decreased performance when running several tasks per node) Cheers, Gilles On 7/19/2016 6:57 AM, Howard Pritchard wrote: Hi Folks, I have a cluster with some 100 Gb ethernet cards installed. What we are noticing if we force Open MPI 1.10.3 to go through the TCP BTL (rather than yalla) is that the performance of osu_bw once the TCP BTL switches from eager to rendezvous (> 32KB) falls off a cliff, going from about 1.6 GB/sec to 233 MB/sec and stays that way out to 4 MB message lengths. There's nothing wrong with the IP stack (iperf -P4 gives 63 Gb/sec). So, my questions are 1) is this performance expected for the TCP BTL when in rendezvous mode? 2) is there some way to get more like the single socket performance obtained with iperf for large messages (~16 Gb/sec). We tried adjusting the tcp_btl_rendezvous threshold but that doesn't appear to actually be adjustable from the mpirun command line. Thanks for any suggestions, Howard _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19237.php _______________________________________________ devel mailing list de...@open-mpi.org<mailto:de...@open-mpi.org> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel Link to this post: http://www.open-mpi.org/community/lists/devel/2016/07/19240.php