Overall it looks good. It would be helpful to validate performance numbers for other interconnects as well. -Pasha
> -----Original Message----- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan > Hjelm > Sent: Tuesday, January 07, 2014 6:45 PM > To: Open MPI Developers List > Subject: [OMPI devel] RFC: OB1 optimizations > > What: Push some ob1 optimizations to the trunk and 1.7.5. > > What: This patch contains two optimizations: > > - Introduce a fast send path for blocking send calls. This path uses > the btl sendi function to put the data on the wire without the need > for setting up a send request. In the case of btl/vader this can > also avoid allocating/initializing a new fragment. With btl/vader > this optimization improves small message latency by 50-200ns in > ping-pong type benchmarks. Larger messages may take a small hit in > the range of 10-20ns. > > - Use a stack-allocated receive request for blocking recieves. This > optimization saves the extra instructions associated with accessing > the receive request free list. I was able to get another 50-200ns > improvement in the small-message ping-pong with this optimization. I > see no hit for larger messages. > > When: These changes touch the critical path in ob1 and are targeted for > 1.7.5. As such I will set a moderately long timeout. Timeout set for > next Friday (Jan 17). > > Some results from osu_latency on haswell: > > hjelmn@cn143 pt2pt]$ mpirun -n 2 --bind-to core -mca btl vader,self > ./osu_latency > # OSU MPI Latency Test v4.0.1 > # Size Latency (us) > 0 0.11 > 1 0.14 > 2 0.14 > 4 0.14 > 8 0.14 > 16 0.14 > 32 0.15 > 64 0.18 > 128 0.36 > 256 0.37 > 512 0.46 > 1024 0.56 > 2048 0.80 > 4096 1.12 > 8192 1.68 > 16384 2.98 > 32768 5.10 > 65536 8.12 > 131072 14.07 > 262144 25.30 > 524288 47.40 > 1048576 91.71 > 2097152 195.56 > 4194304 487.05 > > > Patch Attached. > > -Nathan