Nevermind, since Nathan just clarified that the results are not comparable.
-Paul [Sent from my phone] On Jan 8, 2014 8:58 AM, "Paul Hargrove" <phhargr...@lbl.gov> wrote: > Interestingly enough the 4MB latency actually improved significantly > relative to the initial numbers. > > -Paul [Sent from my phone] > On Jan 8, 2014 8:50 AM, "George Bosilca" <bosi...@icl.utk.edu> wrote: > >> These results are way worst that the one you send on your previous email? >> What is the reason? >> >> George. >> >> On Jan 8, 2014, at 17:33 , Nathan Hjelm <hje...@lanl.gov> wrote: >> >> > Ah, good catch. A new version is attached that should eliminate the race >> > window for the multi-threaded case. Performance numbers are still >> > looking really good. We beat mvapich2 in the small message ping-pong by >> > a good margin. See the results below. The large message latency >> > difference for large messages is probably due to a difference in the max >> > send size for vader vs mvapich. >> > >> > To answer Pasha's question. I don't see a noticiable difference in >> > performance for btl's with no sendi function (this includes >> > ugni). OpenIB should get a boost. I will test that once I get an >> > allocation. >> > >> > CPU: Xeon E5-2670 @ 2.60 GHz >> > >> > Open MPI (-mca btl vader,self): >> > # OSU MPI Latency Test v4.1 >> > # Size Latency (us) >> > 0 0.17 >> > 1 0.19 >> > 2 0.19 >> > 4 0.19 >> > 8 0.19 >> > 16 0.19 >> > 32 0.19 >> > 64 0.40 >> > 128 0.40 >> > 256 0.43 >> > 512 0.52 >> > 1024 0.67 >> > 2048 0.94 >> > 4096 1.44 >> > 8192 2.04 >> > 16384 3.47 >> > 32768 6.10 >> > 65536 9.38 >> > 131072 16.47 >> > 262144 29.63 >> > 524288 54.81 >> > 1048576 106.63 >> > 2097152 206.84 >> > 4194304 421.26 >> > >> > >> > mvapich2 1.9: >> > # OSU MPI Latency Test >> > # Size Latency (us) >> > 0 0.23 >> > 1 0.23 >> > 2 0.23 >> > 4 0.23 >> > 8 0.23 >> > 16 0.28 >> > 32 0.28 >> > 64 0.39 >> > 128 0.40 >> > 256 0.40 >> > 512 0.42 >> > 1024 0.51 >> > 2048 0.71 >> > 4096 1.02 >> > 8192 1.60 >> > 16384 3.47 >> > 32768 5.05 >> > 65536 8.06 >> > 131072 14.82 >> > 262144 28.15 >> > 524288 53.69 >> > 1048576 127.47 >> > 2097152 235.58 >> > 4194304 683.90 >> > >> > >> > -Nathan >> > >> > On Tue, Jan 07, 2014 at 06:23:13PM -0700, George Bosilca wrote: >> >> The local request is not correctly released, leading to assert in >> debug >> >> mode. This is because you avoid calling >> MCA_PML_BASE_RECV_REQUEST_FINI, >> >> fact that leaves the request in an ACTIVE state, condition carefully >> >> checked during the call to destructor. >> >> >> >> I attached a second patch that fixes the issue above, and implement a >> >> similar optimization for the blocking send. >> >> >> >> Unfortunately, this is not enough. The mca_pml_ob1_send_inline >> >> optimization is horribly wrong in a multithreaded case as it alter >> the >> >> send_sequence without storing it. If you create a gap in the >> send_sequence >> >> a deadlock will __definitively__ occur. I strongly suggest you turn >> off >> >> the mca_pml_ob1_send_inline optimization on the multithreaded case. >> All >> >> the others optimizations should be safe in all cases. >> >> >> >> George. >> >> >> >> On Jan 8, 2014, at 01:15 , Shamis, Pavel <sham...@ornl.gov> wrote: >> >> >> >>> Overall it looks good. It would be helpful to validate performance >> >> numbers for other interconnects as well. >> >>> -Pasha >> >>> >> >>>> -----Original Message----- >> >>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan >> >>>> Hjelm >> >>>> Sent: Tuesday, January 07, 2014 6:45 PM >> >>>> To: Open MPI Developers List >> >>>> Subject: [OMPI devel] RFC: OB1 optimizations >> >>>> >> >>>> What: Push some ob1 optimizations to the trunk and 1.7.5. >> >>>> >> >>>> What: This patch contains two optimizations: >> >>>> >> >>>> - Introduce a fast send path for blocking send calls. This path uses >> >>>> the btl sendi function to put the data on the wire without the need >> >>>> for setting up a send request. In the case of btl/vader this can >> >>>> also avoid allocating/initializing a new fragment. With btl/vader >> >>>> this optimization improves small message latency by 50-200ns in >> >>>> ping-pong type benchmarks. Larger messages may take a small hit in >> >>>> the range of 10-20ns. >> >>>> >> >>>> - Use a stack-allocated receive request for blocking recieves. This >> >>>> optimization saves the extra instructions associated with accessing >> >>>> the receive request free list. I was able to get another 50-200ns >> >>>> improvement in the small-message ping-pong with this optimization. >> I >> >>>> see no hit for larger messages. >> >>>> >> >>>> When: These changes touch the critical path in ob1 and are targeted >> for >> >>>> 1.7.5. As such I will set a moderately long timeout. Timeout set for >> >>>> next Friday (Jan 17). >> >>>> >> >>>> Some results from osu_latency on haswell: >> >>>> >> >>>> hjelmn@cn143 pt2pt]$ mpirun -n 2 --bind-to core -mca btl vader,self >> >>>> ./osu_latency >> >>>> # OSU MPI Latency Test v4.0.1 >> >>>> # Size Latency (us) >> >>>> 0 0.11 >> >>>> 1 0.14 >> >>>> 2 0.14 >> >>>> 4 0.14 >> >>>> 8 0.14 >> >>>> 16 0.14 >> >>>> 32 0.15 >> >>>> 64 0.18 >> >>>> 128 0.36 >> >>>> 256 0.37 >> >>>> 512 0.46 >> >>>> 1024 0.56 >> >>>> 2048 0.80 >> >>>> 4096 1.12 >> >>>> 8192 1.68 >> >>>> 16384 2.98 >> >>>> 32768 5.10 >> >>>> 65536 8.12 >> >>>> 131072 14.07 >> >>>> 262144 25.30 >> >>>> 524288 47.40 >> >>>> 1048576 91.71 >> >>>> 2097152 195.56 >> >>>> 4194304 487.05 >> >>>> >> >>>> >> >>>> Patch Attached. >> >>>> >> >>>> -Nathan >> >>> _______________________________________________ >> >>> devel mailing list >> >>> de...@open-mpi.org >> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> >> devel mailing list >> >> de...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> > >> > >> <ob1_optimization_take3.patch>_______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >