Re: [OMPI devel] GPUDirect v1 issues

2012-01-21 Thread Kenneth Lloyd
Sebastian,

If possible, I strongly suggest you look into CUDA 4.1 r2 and using Rolf
vandeVaart's MPI CUDA RDMA 3).  Your life will be MUCH easier.

Having used GPUDirect1 in the last half of 2010, I can say it is a pain
for the 9 - 14% gain in efficiency we saw.

Ken

On Fri, 2012-01-20 at 18:20 +0100, Sebastian Rinke wrote:
> With 
> 
> 
> * MLNX OFED stack tailored for GPUDirect
> * RHEL + kernel patch 
> * MVAPICH2 
> 
> 
> it is possible to monitor GPUDirect v1 activities by means of
> observing changes to values in
> 
> 
> * /sys/module/ib_core/parameters/gpu_direct_pages
> * /sys/module/ib_core/parameters/gpu_direct_shares
> 
> 
> By setting CUDA_NIC_INTEROP=1 there are no changes anymore.
> 
> 
> Is there a different way now to monitor if GPUDirect actually works?
> 
> 
> Sebastian.
> 
> 
> 
> On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote:
> 
> 
> 
> > It is documented
> > in 
> > http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf
> > set CUDA_NIC_INTEROP=1
> >  
> >  
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> > Behalf Of Sebastian Rinke
> > Sent: Wednesday, January 18, 2012 8:15 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > Setting the environment variable fixed the problem for Open MPI with
> > CUDA 4.0. Thanks!
> >  
> > However, I'm wondering why this is not documented in the NVIDIA
> > GPUDirect package.
> >  
> > Sebastian.
> >  
> > On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:
> > 
> > 
> > 
> > Yes, the step outlined in your second bullet is no longer
> > necessary. 
> >  
> > Rolf
> >  
> >  
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> > Behalf Of Sebastian Rinke
> > Sent: Tuesday, January 17, 2012 5:22 PM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > Thank you very much. I will try setting the environment variable and
> > if required also use the 4.1 RC2 version.
> > 
> > To clarify things a little bit for me, to set up my machine with
> > GPUDirect v1 I did the following:
> > 
> > * Install RHEL 5.4
> > * Use the kernel with GPUDirect support
> > * Use the MLNX OFED stack with GPUDirect support
> > * Install the CUDA developer driver
> > 
> > Does using CUDA  >= 4.0  make one of the above steps  redundant?
> > 
> > I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect
> > support is  not needed any more?
> > 
> > Sebastian.
> > 
> > Rolf vandeVaart wrote:
> > 
> > I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
> > fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
> > Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
> > support (you do not need to set an environment variable for one)
> > http://developer.nvidia.com/cuda-toolkit-41
> >  
> > There is also a chance that setting the environment variable as outlined in 
> > this link may help you.
> > http://forums.nvidia.com/index.php?showtopic=200629
> >  
> > However, I cannot explain why MVAPICH would work and Open MPI would not.  
> >  
> > Rolf
> >  
> >   
> > 
> > -Original Message-
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
> > On Behalf Of Sebastian Rinke
> > Sent: Tuesday, January 17, 2012 12:08 PM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
> >  
> > Attached you find a little test case which is based on the 
> > GPUDirect v1 test
> > case (mpi_pinned.c).
> > In that program the sender splits a message into chunks and sends 
> > them
> > separately to the receiver which posts the corresponding recvs. It 
> > is a kind of
> > pipelining.
> >  
> > In mpi_pinned.c:141 the offsets into the recv buffer are set.
> > For the correct offsets, i.e. increasing them, it blocks with Open 
> > MPI.
> >  
> > Using line 142 instead (offset = 0) works.
> >  
> > The tarball attached contains a Makefile where you will have to 
> > adjust
> >  
> > * CUDA_INC_DIR
> > * CUDA_LIB_DIR
> >  
> > Sebastian
> >  
> > On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
> >  
> >
> > 
> > Also, which version of MVAPICH2 did you use?
> >  
> > I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using 
> > CUDA 4.1 r2)
> > vis MVAPICH-GPU on a small 3 node cluster. These are 
> > wickedly interesting.
> >  
> > Ken
> > -Original Message-
> > From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
> >   
> > 
>

[OMPI devel] OpenMPI 1.5.x and MPI 2.2

2012-01-21 Thread Kenneth Lloyd
To what extent is distributed MPI_Graph (MPI 2.2) supported in OpenMPI
1.5.x?

Version shows it is based on MPI 2.1., but there are other
references ...

==
Kenneth A. Lloyd, Jr.
CEO - Director of Systems Science
Watt Systems Technologies Inc.
Albuquerque, NM US

This e-mail is covered by the Electronic Communications Privacy Act, 18
U.S.C. 2510-2521 and is intended only for the addressee named above. It
may contain privileged or confidential information. If you are not the
addressee you must not copy, distribute, disclose or use any of the
information in it. If you have received it in error please delete it and
immediately notify the sender.




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25762

2012-01-21 Thread George Bosilca
How about instead of all the patches (r25758, r25762 and r25763) we just set 
both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored on 
Unix while the other on Darwin?

Another benefit will be to have a significantly cleaner...

  george.

On Jan 21, 2012, at 18:48 , r...@osl.iu.edu wrote:

> Author: rhc
> Date: 2012-01-21 18:48:42 EST (Sat, 21 Jan 2012)
> New Revision: 25762
> URL: https://svn.open-mpi.org/trac/ompi/changeset/25762
> 
> Log:
> Expand the coverage a little when looking at remote shells for rsh. Prior 
> patch (r25758) works only if both ends of the rsh/ssh connection are Mac. 
> What we really want is to use the Mac version of ld_library_path when the 
> remote end is Mac, regardless of the OS where mpirun is executing. So add a 
> test for system type to the remote_shell test, and set the ld_library_path 
> name to match the remote system type.




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25762

2012-01-21 Thread Ralph Castain
We could - but the cmd line is already quite long, and that would make it 
worse.  Doesn't strike me as all that complicated.

On Jan 21, 2012, at 7:26 PM, George Bosilca wrote:

> How about instead of all the patches (r25758, r25762 and r25763) we just set 
> both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH everywhere? One will get ignored 
> on Unix while the other on Darwin?
> 
> Another benefit will be to have a significantly cleaner...
> 
>  george.
> 
> On Jan 21, 2012, at 18:48 , r...@osl.iu.edu wrote:
> 
>> Author: rhc
>> Date: 2012-01-21 18:48:42 EST (Sat, 21 Jan 2012)
>> New Revision: 25762
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/25762
>> 
>> Log:
>> Expand the coverage a little when looking at remote shells for rsh. Prior 
>> patch (r25758) works only if both ends of the rsh/ssh connection are Mac. 
>> What we really want is to use the Mac version of ld_library_path when the 
>> remote end is Mac, regardless of the OS where mpirun is executing. So add a 
>> test for system type to the remote_shell test, and set the ld_library_path 
>> name to match the remote system type.
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel