Hi Folks, I have a cluster with some 100 Gb ethernet cards installed. What we are noticing if we force Open MPI 1.10.3 to go through the TCP BTL (rather than yalla) is that the performance of osu_bw once the TCP BTL switches from eager to rendezvous (> 32KB) falls off a cliff, going from about 1.6 GB/sec to 233 MB/sec and stays that way out to 4 MB message lengths.
There's nothing wrong with the IP stack (iperf -P4 gives 63 Gb/sec). So, my questions are 1) is this performance expected for the TCP BTL when in rendezvous mode? 2) is there some way to get more like the single socket performance obtained with iperf for large messages (~16 Gb/sec). We tried adjusting the tcp_btl_rendezvous threshold but that doesn't appear to actually be adjustable from the mpirun command line. Thanks for any suggestions, Howard