Re: [OMPI devel] MVAPICH2 vs Open-MPI

2012-02-15 Thread Jeff Squyres
I think the short answer is: Rolf is currently working on better GP-GPU 
integration with Open MPI.  :-)


On Feb 14, 2012, at 5:36 PM, Rolf vandeVaart wrote:

> There are several things going on here that make their library perform better.
> 
> With respect to inter-node performance, both MVAPICH2 and Open MPI copy the 
> GPU memory into host memory first.  However, they are using special host 
> buffers that and a code path that allows them to copy the data asynchronously 
> and therefore do a better job pipelining than Open MPI.  I believe their host 
> buffers are bigger which works better at larger messages.  Open MPI just 
> piggy backs on the existing host buffers in the Open MPI openib BTL.  Open 
> MPI also just uses synchronous copies .  (There is hope to improve that)
> 
> Secondly, with respect to intra-node performance, they are using the Inter 
> Process Communication feature of CUDA which means that within a node, one can 
> move GPU memory directly from one GPU to another.  We have an RFC from 
> December to add this into Open MPI as well, but do not have approval yet.  
> Hopefully sometime soon.
> 
> Rolf
> 
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>> On Behalf Of Rayson Ho
>> Sent: Tuesday, February 14, 2012 4:16 PM.
>> To: Open MPI Developers
>> Subject: [OMPI devel] MVAPICH2 vs Open-MPI
>> 
>> See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it
>> something that they are doing to optimize for CUDA & GPUs and those
>> optimizations are not in OMPI, or did they specifically tune MVAPICH2 to
>> make it shine??
>> 
>> http://hpcadvisorycouncil.com/events/2012/Israel-
>> Workshop/Presentations/7_OSU.pdf
>> 
>> The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/
>> 
>> Rayson
>> 
>> =
>> Open Grid Scheduler / Grid Engine
>> http://gridscheduler.sourceforge.net/
>> 
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] MVAPICH2 vs Open-MPI

2012-02-14 Thread Rolf vandeVaart
There are several things going on here that make their library perform better.

With respect to inter-node performance, both MVAPICH2 and Open MPI copy the GPU 
memory into host memory first.  However, they are using special host buffers 
that and a code path that allows them to copy the data asynchronously and 
therefore do a better job pipelining than Open MPI.  I believe their host 
buffers are bigger which works better at larger messages.  Open MPI just piggy 
backs on the existing host buffers in the Open MPI openib BTL.  Open MPI also 
just uses synchronous copies .  (There is hope to improve that)

Secondly, with respect to intra-node performance, they are using the Inter 
Process Communication feature of CUDA which means that within a node, one can 
move GPU memory directly from one GPU to another.  We have an RFC from December 
to add this into Open MPI as well, but do not have approval yet.  Hopefully 
sometime soon.

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Rayson Ho
>Sent: Tuesday, February 14, 2012 4:16 PM.
>To: Open MPI Developers
>Subject: [OMPI devel] MVAPICH2 vs Open-MPI
>
>See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it
>something that they are doing to optimize for CUDA & GPUs and those
>optimizations are not in OMPI, or did they specifically tune MVAPICH2 to
>make it shine??
>
>http://hpcadvisorycouncil.com/events/2012/Israel-
>Workshop/Presentations/7_OSU.pdf
>
>The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/
>
>Rayson
>
>=
>Open Grid Scheduler / Grid Engine
>http://gridscheduler.sourceforge.net/
>
>Scalable Grid Engine Support Program
>http://www.scalablelogic.com/
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI devel] MVAPICH2 vs Open-MPI

2012-02-14 Thread Rayson Ho
See P. 38 - 40, MVAPICH2 outperforms Open-MPI for each test, so is it
something that they are doing to optimize for CUDA & GPUs and those
optimizations are not in OMPI, or did they specifically tune MVAPICH2
to make it shine??

http://hpcadvisorycouncil.com/events/2012/Israel-Workshop/Presentations/7_OSU.pdf

The benchmark package: http://mvapich.cse.ohio-state.edu/benchmarks/

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/