Overall it looks good. It would be helpful to validate performance numbers for 
other interconnects as well.
-Pasha

> -----Original Message-----
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan
> Hjelm
> Sent: Tuesday, January 07, 2014 6:45 PM
> To: Open MPI Developers List
> Subject: [OMPI devel] RFC: OB1 optimizations
> 
> What: Push some ob1 optimizations to the trunk and 1.7.5.
> 
> What: This patch contains two optimizations:
> 
>   - Introduce a fast send path for blocking send calls. This path uses
>     the btl sendi function to put the data on the wire without the need
>     for setting up a send request. In the case of btl/vader this can
>     also avoid allocating/initializing a new fragment. With btl/vader
>     this optimization improves small message latency by 50-200ns in
>     ping-pong type benchmarks. Larger messages may take a small hit in
>     the range of 10-20ns.
> 
>   - Use a stack-allocated receive request for blocking recieves. This
>     optimization saves the extra instructions associated with accessing
>     the receive request free list. I was able to get another 50-200ns
>     improvement in the small-message ping-pong with this optimization. I
>     see no hit for larger messages.
> 
> When: These changes touch the critical path in ob1 and are targeted for
> 1.7.5. As such I will set a moderately long timeout. Timeout set for
> next Friday (Jan 17).
> 
> Some results from osu_latency on haswell:
> 
> hjelmn@cn143 pt2pt]$ mpirun -n 2 --bind-to core -mca btl vader,self
> ./osu_latency
> # OSU MPI Latency Test v4.0.1
> # Size          Latency (us)
> 0                       0.11
> 1                       0.14
> 2                       0.14
> 4                       0.14
> 8                       0.14
> 16                      0.14
> 32                      0.15
> 64                      0.18
> 128                     0.36
> 256                     0.37
> 512                     0.46
> 1024                    0.56
> 2048                    0.80
> 4096                    1.12
> 8192                    1.68
> 16384                   2.98
> 32768                   5.10
> 65536                   8.12
> 131072                 14.07
> 262144                 25.30
> 524288                 47.40
> 1048576                91.71
> 2097152               195.56
> 4194304               487.05
> 
> 
> Patch Attached.
> 
> -Nathan

Reply via email to