Re: [OMPI devel] GPUDirect v1 issues

2012-01-23 Thread Sebastian Rinke
Ok, thank you Ken and Rolf. I will have a look into the 4.1 version.

@ Rolf:
I actually meant MVAPICH2, since Open MPI requires to have CUDA_NIC_INTEROP=1 
set.
However, setting the environment variable does not show any changes in the 
files previously mentioned.

Nevertheless, you already answered my question. Thanks.

Sebastian. 

On Jan 21, 2012, at 4:03 PM, Kenneth Lloyd wrote:

> Sebastian,
> 
> If possible, I strongly suggest you look into CUDA 4.1 r2 and using Rolf 
> vandeVaart's MPI CUDA RDMA 3).  Your life will be MUCH easier.
> 
> Having used GPUDirect1 in the last half of 2010, I can say it is a pain for 
> the 9 - 14% gain in efficiency we saw.
> 
> Ken
> 
> On Fri, 2012-01-20 at 18:20 +0100, Sebastian Rinke wrote:
>> 
>> With 
>> 
>> 
>> * MLNX OFED stack tailored for GPUDirect
>> * RHEL + kernel patch 
>> * MVAPICH2 
>> 
>> 
>> it is possible to monitor GPUDirect v1 activities by means of observing 
>> changes to values in
>> 
>> 
>> * /sys/module/ib_core/parameters/gpu_direct_pages
>> * /sys/module/ib_core/parameters/gpu_direct_shares
>> 
>> 
>> By setting CUDA_NIC_INTEROP=1 there are no changes anymore.
>> 
>> 
>> Is there a different way now to monitor if GPUDirect actually works?
>> 
>> 
>> Sebastian.
>> 
>> On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote:
>> 
>>> It is documented in 
>>> http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf
>>> set CUDA_NIC_INTEROP=1
>>>  
>>>  
>>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>>> Behalf Of Sebastian Rinke
>>> Sent: Wednesday, January 18, 2012 8:15 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>>  
>>> Setting the environment variable fixed the problem for Open MPI with CUDA 
>>> 4.0. Thanks!
>>>  
>>> However, I'm wondering why this is not documented in the NVIDIA GPUDirect 
>>> package.
>>>  
>>> Sebastian.
>>>  
>>> On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:
>>> 
>>> 
>>> 
>>> Yes, the step outlined in your second bullet is no longer necessary. 
>>>  
>>> Rolf
>>>  
>>>  
>>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>>> Behalf Of Sebastian Rinke
>>> Sent: Tuesday, January 17, 2012 5:22 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>>  
>>> Thank you very much. I will try setting the environment variable and if 
>>> required also use the 4.1 RC2 version.
>>> 
>>> To clarify things a little bit for me, to set up my machine with GPUDirect 
>>> v1 I did the following:
>>> 
>>> * Install RHEL 5.4
>>> * Use the kernel with GPUDirect support
>>> * Use the MLNX OFED stack with GPUDirect support
>>> * Install the CUDA developer driver
>>> 
>>> Does using CUDA  >= 4.0  make one of the above steps  redundant?
>>> 
>>> I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is 
>>>  not needed any more?
>>> 
>>> Sebastian.
>>> 
>>> Rolf vandeVaart wrote:
>>> I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
>>> fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
>>> Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
>>> support (you do not need to set an environment variable for one)
>>> http://developer.nvidia.com/cuda-toolkit-41
>>>  
>>> There is also a chance that setting the environment variable as outlined in 
>>> this link may help you.
>>> http://forums.nvidia.com/index.php?showtopic=200629
>>>  
>>> However, I cannot explain why MVAPICH would work and Open MPI would not.  
>>>  
>>> Rolf
>>>  
>>>   
>>> -Original Message-
>>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>>> On Behalf Of Sebastian Rinke
>>> Sent: Tuesday, January 17, 2012 12:08 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>>  
>>> I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>>>  
>>> Attached you find a little test case which is based on the GPUDirect v1 test
>>> case (mpi_pinned.c).
>>> In that pro

Re: [OMPI devel] GPUDirect v1 issues

2012-01-21 Thread Kenneth Lloyd
Sebastian,

If possible, I strongly suggest you look into CUDA 4.1 r2 and using Rolf
vandeVaart's MPI CUDA RDMA 3).  Your life will be MUCH easier.

Having used GPUDirect1 in the last half of 2010, I can say it is a pain
for the 9 - 14% gain in efficiency we saw.

Ken

On Fri, 2012-01-20 at 18:20 +0100, Sebastian Rinke wrote:
> With 
> 
> 
> * MLNX OFED stack tailored for GPUDirect
> * RHEL + kernel patch 
> * MVAPICH2 
> 
> 
> it is possible to monitor GPUDirect v1 activities by means of
> observing changes to values in
> 
> 
> * /sys/module/ib_core/parameters/gpu_direct_pages
> * /sys/module/ib_core/parameters/gpu_direct_shares
> 
> 
> By setting CUDA_NIC_INTEROP=1 there are no changes anymore.
> 
> 
> Is there a different way now to monitor if GPUDirect actually works?
> 
> 
> Sebastian.
> 
> 
> 
> On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote:
> 
> 
> 
> > It is documented
> > in 
> > http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf
> > set CUDA_NIC_INTEROP=1
> >  
> >  
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> > Behalf Of Sebastian Rinke
> > Sent: Wednesday, January 18, 2012 8:15 AM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > Setting the environment variable fixed the problem for Open MPI with
> > CUDA 4.0. Thanks!
> >  
> > However, I'm wondering why this is not documented in the NVIDIA
> > GPUDirect package.
> >  
> > Sebastian.
> >  
> > On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:
> > 
> > 
> > 
> > Yes, the step outlined in your second bullet is no longer
> > necessary. 
> >  
> > Rolf
> >  
> >  
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> > Behalf Of Sebastian Rinke
> > Sent: Tuesday, January 17, 2012 5:22 PM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > Thank you very much. I will try setting the environment variable and
> > if required also use the 4.1 RC2 version.
> > 
> > To clarify things a little bit for me, to set up my machine with
> > GPUDirect v1 I did the following:
> > 
> > * Install RHEL 5.4
> > * Use the kernel with GPUDirect support
> > * Use the MLNX OFED stack with GPUDirect support
> > * Install the CUDA developer driver
> > 
> > Does using CUDA  >= 4.0  make one of the above steps  redundant?
> > 
> > I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect
> > support is  not needed any more?
> > 
> > Sebastian.
> > 
> > Rolf vandeVaart wrote:
> > 
> > I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
> > fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
> > Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
> > support (you do not need to set an environment variable for one)
> > http://developer.nvidia.com/cuda-toolkit-41
> >  
> > There is also a chance that setting the environment variable as outlined in 
> > this link may help you.
> > http://forums.nvidia.com/index.php?showtopic=200629
> >  
> > However, I cannot explain why MVAPICH would work and Open MPI would not.  
> >  
> > Rolf
> >  
> >   
> > 
> > -Original Message-
> > From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
> > On Behalf Of Sebastian Rinke
> > Sent: Tuesday, January 17, 2012 12:08 PM
> > To: Open MPI Developers
> > Subject: Re: [OMPI devel] GPUDirect v1 issues
> >  
> > I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
> >  
> > Attached you find a little test case which is based on the 
> > GPUDirect v1 test
> > case (mpi_pinned.c).
> > In that program the sender splits a message into chunks and sends 
> > them
> > separately to the receiver which posts the corresponding recvs. It 
> > is a kind of
> > pipelining.
> >  
> > In mpi_pinned.c:141 the offsets into the recv buffer are set.
> > For the correct offsets, i.e. increasing them, it blocks with Open 
> > MPI.
> >  
> > Using line 142 instead (offset = 0) works.
> >  
> > The tarball attached contains a Makefile where you will have to 
> &

Re: [OMPI devel] GPUDirect v1 issues

2012-01-20 Thread Rolf vandeVaart
You can tell it is working because your program does not hang anymore :)  
Otherwise, there is a not a way that I am aware of.

Rolf

PS: And I assume you mean Open MPI under your third bullet below.

From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Sebastian Rinke
Sent: Friday, January 20, 2012 12:21 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

With

* MLNX OFED stack tailored for GPUDirect
* RHEL + kernel patch
* MVAPICH2

it is possible to monitor GPUDirect v1 activities by means of observing changes 
to values in

* /sys/module/ib_core/parameters/gpu_direct_pages
* /sys/module/ib_core/parameters/gpu_direct_shares

By setting CUDA_NIC_INTEROP=1 there are no changes anymore.

Is there a different way now to monitor if GPUDirect actually works?

Sebastian.

On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote:


It is documented in 
http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf
set CUDA_NIC_INTEROP=1


From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-boun...@open-mpi.org] On Behalf Of Sebastian Rinke
Sent: Wednesday, January 18, 2012 8:15 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

Setting the environment variable fixed the problem for Open MPI with CUDA 4.0. 
Thanks!

However, I'm wondering why this is not documented in the NVIDIA GPUDirect 
package.

Sebastian.

On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:



Yes, the step outlined in your second bullet is no longer necessary.

Rolf


From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-boun...@open-mpi.org] On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 5:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

Thank you very much. I will try setting the environment variable and if 
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with GPUDirect v1 I 
did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA  >= 4.0  make one of the above steps  redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is  
not needed any more?

Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.

Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)

 http://developer.nvidia.com/cuda-toolkit-41



There is also a chance that setting the environment variable as outlined in 
this link may help you.

http://forums.nvidia.com/index.php?showtopic=200629



However, I cannot explain why MVAPICH would work and Open MPI would not.



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-boun...@open-mpi.org]

On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 12:08 PM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.



Attached you find a little test case which is based on the GPUDirect v1 test

case (mpi_pinned.c).

In that program the sender splits a message into chunks and sends them

separately to the receiver which posts the corresponding recvs. It is a kind of

pipelining.



In mpi_pinned.c:141 the offsets into the recv buffer are set.

For the correct offsets, i.e. increasing them, it blocks with Open MPI.



Using line 142 instead (offset = 0) works.



The tarball attached contains a Makefile where you will have to adjust



* CUDA_INC_DIR

* CUDA_LIB_DIR



Sebastian



On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:





Also, which version of MVAPICH2 did you use?



I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)

vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.



Ken

-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Rolf vandeVaart

Sent: Tuesday, January 17, 2012 7:54 AM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I am not aware of any issues.  Can you send me a test program and I

can try it out?

Which version of CUDA are you using?



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 8:50 AM

To: Open MPI Developers

Subject: [OMPI devel] GPUDirect v1 issues



Dear all,



I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking

MPI_SEND/

Re: [OMPI devel] GPUDirect v1 issues

2012-01-20 Thread Sebastian Rinke
With 

* MLNX OFED stack tailored for GPUDirect
* RHEL + kernel patch 
* MVAPICH2 

it is possible to monitor GPUDirect v1 activities by means of observing changes 
to values in

* /sys/module/ib_core/parameters/gpu_direct_pages
* /sys/module/ib_core/parameters/gpu_direct_shares

By setting CUDA_NIC_INTEROP=1 there are no changes anymore.

Is there a different way now to monitor if GPUDirect actually works?

Sebastian.

On Jan 18, 2012, at 5:06 PM, Kenneth Lloyd wrote:

> It is documented in 
> http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technology_Overview.pdf
> set CUDA_NIC_INTEROP=1
>  
>  
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Sebastian Rinke
> Sent: Wednesday, January 18, 2012 8:15 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> Setting the environment variable fixed the problem for Open MPI with CUDA 
> 4.0. Thanks!
>  
> However, I'm wondering why this is not documented in the NVIDIA GPUDirect 
> package.
>  
> Sebastian.
>  
> On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:
> 
> 
> Yes, the step outlined in your second bullet is no longer necessary. 
>  
> Rolf
>  
>  
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 5:22 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> Thank you very much. I will try setting the environment variable and if 
> required also use the 4.1 RC2 version.
> 
> To clarify things a little bit for me, to set up my machine with GPUDirect v1 
> I did the following:
> 
> * Install RHEL 5.4
> * Use the kernel with GPUDirect support
> * Use the MLNX OFED stack with GPUDirect support
> * Install the CUDA developer driver
> 
> Does using CUDA  >= 4.0  make one of the above steps  redundant?
> 
> I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is  
> not needed any more?
> 
> Sebastian.
> 
> Rolf vandeVaart wrote:
> I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
> fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
> Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
> support (you do not need to set an environment variable for one)
>  http://developer.nvidia.com/cuda-toolkit-41
>  
> There is also a chance that setting the environment variable as outlined in 
> this link may help you.
> http://forums.nvidia.com/index.php?showtopic=200629
>  
> However, I cannot explain why MVAPICH would work and Open MPI would not.  
>  
> Rolf
>  
>   
> -----Original Message-----
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
> On Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 12:08 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>  
> Attached you find a little test case which is based on the GPUDirect v1 test
> case (mpi_pinned.c).
> In that program the sender splits a message into chunks and sends them
> separately to the receiver which posts the corresponding recvs. It is a kind 
> of
> pipelining.
>  
> In mpi_pinned.c:141 the offsets into the recv buffer are set.
> For the correct offsets, i.e. increasing them, it blocks with Open MPI.
>  
> Using line 142 instead (offset = 0) works.
>  
> The tarball attached contains a Makefile where you will have to adjust
>  
> * CUDA_INC_DIR
> * CUDA_LIB_DIR
>  
> Sebastian
>  
> On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
>  
> 
> Also, which version of MVAPICH2 did you use?
>  
> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
> vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
>  
> Ken
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>   
> mpi.org]
> 
> On Behalf Of Rolf vandeVaart
> Sent: Tuesday, January 17, 2012 7:54 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> I am not aware of any issues.  Can you send me a test program and I
> can try it out?
> Which version of CUDA are you using?
>  
> Rolf
>  
>   
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
> 
> mpi.org]
> 
> On Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 8:50 AM
> To: Open MPI Developers
> Subject: [OMPI devel] GPUDirect v1 issues
>  
> Dear all,
>  
> I'm using GPUDirect v1 with Open MPI

Re: [OMPI devel] GPUDirect v1 issues

2012-01-18 Thread Kenneth Lloyd
It is documented in
http://developer.download.nvidia.com/compute/cuda/4_0/docs/GPUDirect_Technol
ogy_Overview.pdf

set CUDA_NIC_INTEROP=1 





From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Sebastian Rinke
Sent: Wednesday, January 18, 2012 8:15 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues



Setting the environment variable fixed the problem for Open MPI with CUDA
4.0. Thanks!



However, I'm wondering why this is not documented in the NVIDIA GPUDirect
package.



Sebastian.



On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:





Yes, the step outlined in your second bullet is no longer necessary. 



Rolf





From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 5:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues



Thank you very much. I will try setting the environment variable and if
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with GPUDirect
v1 I did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA  >= 4.0  make one of the above steps  redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is
not needed any more?

Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
Any chance you can try CUDA 4.1 RC2?  There were some improvements in the
support (you do not need to set an environment variable for one)
 http://developer.nvidia.com/cuda-toolkit-41

There is also a chance that setting the environment variable as outlined in
this link may help you.
http://forums.nvidia.com/index.php?showtopic=200629

However, I cannot explain why MVAPICH would work and Open MPI would not.  

Rolf



-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 12:08 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.

Attached you find a little test case which is based on the GPUDirect v1 test
case (mpi_pinned.c).
In that program the sender splits a message into chunks and sends them
separately to the receiver which posts the corresponding recvs. It is a kind
of
pipelining.

In mpi_pinned.c:141 the offsets into the recv buffer are set.
For the correct offsets, i.e. increasing them, it blocks with Open MPI.

Using line 142 instead (offset = 0) works.

The tarball attached contains a Makefile where you will have to adjust

* CUDA_INC_DIR
* CUDA_LIB_DIR

Sebastian

On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:



Also, which version of MVAPICH2 did you use?

I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.

Ken
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-


mpi.org]


On Behalf Of Rolf vandeVaart
Sent: Tuesday, January 17, 2012 7:54 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I am not aware of any issues.  Can you send me a test program and I
can try it out?
Which version of CUDA are you using?

Rolf



-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-


mpi.org]


On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 8:50 AM
To: Open MPI Developers
Subject: [OMPI devel] GPUDirect v1 issues

Dear all,

I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
MPI_SEND/RECV to block forever.

For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
the second recv points to somewhere, i.e. not at the beginning, in
the recv buffer (previously allocated with cudaMallocHost()).

I tried the same with MVAPICH2 and did not see the problem.

Does anybody know about issues with GPUDirect v1 using Open MPI?

Thanks for your help,
Sebastian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



---
This email message is for the sole use of the intended recipient(s) and may
contain
confidential information.  Any unauthorized review, use, disclosure or
distribution
is prohibited.  If you are not the intended recipient, please contact the
sender by
reply email and destroy all copies of the original message.

---

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mai

Re: [OMPI devel] GPUDirect v1 issues

2012-01-18 Thread Sebastian Rinke
Setting the environment variable fixed the problem for Open MPI with CUDA 4.0. 
Thanks!

However, I'm wondering why this is not documented in the NVIDIA GPUDirect 
package.

Sebastian.

On Jan 18, 2012, at 1:28 AM, Rolf vandeVaart wrote:

> Yes, the step outlined in your second bullet is no longer necessary. 
>  
> Rolf
>  
>  
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
> Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 5:22 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> Thank you very much. I will try setting the environment variable and if 
> required also use the 4.1 RC2 version.
> 
> To clarify things a little bit for me, to set up my machine with GPUDirect v1 
> I did the following:
> 
> * Install RHEL 5.4
> * Use the kernel with GPUDirect support
> * Use the MLNX OFED stack with GPUDirect support
> * Install the CUDA developer driver
> 
> Does using CUDA  >= 4.0  make one of the above steps  redundant?
> 
> I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is  
> not needed any more?
> 
> Sebastian.
> 
> Rolf vandeVaart wrote:
> I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
> fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
> Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
> support (you do not need to set an environment variable for one)
>  http://developer.nvidia.com/cuda-toolkit-41
>  
> There is also a chance that setting the environment variable as outlined in 
> this link may help you.
> http://forums.nvidia.com/index.php?showtopic=200629
>  
> However, I cannot explain why MVAPICH would work and Open MPI would not.  
>  
> Rolf
>  
>   
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
> On Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 12:08 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>  
> Attached you find a little test case which is based on the GPUDirect v1 test
> case (mpi_pinned.c).
> In that program the sender splits a message into chunks and sends them
> separately to the receiver which posts the corresponding recvs. It is a kind 
> of
> pipelining.
>  
> In mpi_pinned.c:141 the offsets into the recv buffer are set.
> For the correct offsets, i.e. increasing them, it blocks with Open MPI.
>  
> Using line 142 instead (offset = 0) works.
>  
> The tarball attached contains a Makefile where you will have to adjust
>  
> * CUDA_INC_DIR
> * CUDA_LIB_DIR
>  
> Sebastian
>  
> On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
>  
> 
> Also, which version of MVAPICH2 did you use?
>  
> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
> vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
>  
> Ken
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>   
> mpi.org]
> 
> On Behalf Of Rolf vandeVaart
> Sent: Tuesday, January 17, 2012 7:54 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
>  
> I am not aware of any issues.  Can you send me a test program and I
> can try it out?
> Which version of CUDA are you using?
>  
> Rolf
>  
>   
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
> 
> mpi.org]
> 
> On Behalf Of Sebastian Rinke
> Sent: Tuesday, January 17, 2012 8:50 AM
> To: Open MPI Developers
> Subject: [OMPI devel] GPUDirect v1 issues
>  
> Dear all,
>  
> I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
> MPI_SEND/RECV to block forever.
>  
> For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
> the second recv points to somewhere, i.e. not at the beginning, in
> the recv buffer (previously allocated with cudaMallocHost()).
>  
> I tried the same with MVAPICH2 and did not see the problem.
>  
> Does anybody know about issues with GPUDirect v1 using Open MPI?
>  
> Thanks for your help,
> Sebastian
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review

Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
Yes, the step outlined in your second bullet is no longer necessary.

Rolf


From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 5:22 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

Thank you very much. I will try setting the environment variable and if 
required also use the 4.1 RC2 version.

To clarify things a little bit for me, to set up my machine with GPUDirect v1 I 
did the following:

* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA  >= 4.0  make one of the above steps  redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support is  
not needed any more?

Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.

Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)

 http://developer.nvidia.com/cuda-toolkit-41



There is also a chance that setting the environment variable as outlined in 
this link may help you.

http://forums.nvidia.com/index.php?showtopic=200629



However, I cannot explain why MVAPICH would work and Open MPI would not.



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-boun...@open-mpi.org]

On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 12:08 PM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.



Attached you find a little test case which is based on the GPUDirect v1 test

case (mpi_pinned.c).

In that program the sender splits a message into chunks and sends them

separately to the receiver which posts the corresponding recvs. It is a kind of

pipelining.



In mpi_pinned.c:141 the offsets into the recv buffer are set.

For the correct offsets, i.e. increasing them, it blocks with Open MPI.



Using line 142 instead (offset = 0) works.



The tarball attached contains a Makefile where you will have to adjust



* CUDA_INC_DIR

* CUDA_LIB_DIR



Sebastian



On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:





Also, which version of MVAPICH2 did you use?



I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)

vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.



Ken

-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Rolf vandeVaart

Sent: Tuesday, January 17, 2012 7:54 AM

To: Open MPI Developers

Subject: Re: [OMPI devel] GPUDirect v1 issues



I am not aware of any issues.  Can you send me a test program and I

can try it out?

Which version of CUDA are you using?



Rolf





-Original Message-

From: devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org> 
[mailto:devel-bounces@open-



mpi.org]



On Behalf Of Sebastian Rinke

Sent: Tuesday, January 17, 2012 8:50 AM

To: Open MPI Developers

Subject: [OMPI devel] GPUDirect v1 issues



Dear all,



I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking

MPI_SEND/RECV to block forever.



For two subsequent MPI_RECV, it hangs if the recv buffer pointer of

the second recv points to somewhere, i.e. not at the beginning, in

the recv buffer (previously allocated with cudaMallocHost()).



I tried the same with MVAPICH2 and did not see the problem.



Does anybody know about issues with GPUDirect v1 using Open MPI?



Thanks for your help,

Sebastian

___

devel mailing list

de...@open-mpi.org<mailto:de...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/devel



---

This email message is for the sole use of the intended recipient(s) and may 
contain

confidential information.  Any unauthorized review, use, disclosure or 
distribution

is prohibited.  If you are not the intended recipient, please contact the 
sender by

reply email and destroy all copies of the original message.

---



___

devel mailing list

de...@open-mpi.org<mailto:de...@open-mpi.org>

http://www.open-mpi.org/mailman/listinfo.cgi/devel





Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
Thank you very much. I will try setting the environment variable and if 
required also use the 4.1 RC2 version.


To clarify things a little bit for me, to set up my machine with 
GPUDirect v1 I did the following:


* Install RHEL 5.4
* Use the kernel with GPUDirect support
* Use the MLNX OFED stack with GPUDirect support
* Install the CUDA developer driver

Does using CUDA  >= 4.0  make one of the above steps  redundant?

I.e., RHEL or different kernel or MLNX OFED stack with GPUDirect support 
is  not needed any more?


Sebastian.

Rolf vandeVaart wrote:

I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)
 http://developer.nvidia.com/cuda-toolkit-41

There is also a chance that setting the environment variable as outlined in 
this link may help you.
http://forums.nvidia.com/index.php?showtopic=200629

However, I cannot explain why MVAPICH would work and Open MPI would not.  


Rolf

  

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 12:08 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.

Attached you find a little test case which is based on the GPUDirect v1 test
case (mpi_pinned.c).
In that program the sender splits a message into chunks and sends them
separately to the receiver which posts the corresponding recvs. It is a kind of
pipelining.

In mpi_pinned.c:141 the offsets into the recv buffer are set.
For the correct offsets, i.e. increasing them, it blocks with Open MPI.

Using line 142 instead (offset = 0) works.

The tarball attached contains a Makefile where you will have to adjust

* CUDA_INC_DIR
* CUDA_LIB_DIR

Sebastian

On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:



Also, which version of MVAPICH2 did you use?

I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.

Ken
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
  

mpi.org]


On Behalf Of Rolf vandeVaart
Sent: Tuesday, January 17, 2012 7:54 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I am not aware of any issues.  Can you send me a test program and I
can try it out?
Which version of CUDA are you using?

Rolf

  

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-


mpi.org]


On Behalf Of Sebastian Rinke
Sent: Tuesday, January 17, 2012 8:50 AM
To: Open MPI Developers
Subject: [OMPI devel] GPUDirect v1 issues

Dear all,

I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
MPI_SEND/RECV to block forever.

For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
the second recv points to somewhere, i.e. not at the beginning, in
the recv buffer (previously allocated with cudaMallocHost()).

I tried the same with MVAPICH2 and did not see the problem.

Does anybody know about issues with GPUDirect v1 using Open MPI?

Thanks for your help,
Sebastian
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
  




Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
I ran your test case against Open MPI 1.4.2 and CUDA 4.1 RC2 and it worked 
fine.  I do not have a machine right now where I can load CUDA 4.0 drivers.
Any chance you can try CUDA 4.1 RC2?  There were some improvements in the 
support (you do not need to set an environment variable for one)
 http://developer.nvidia.com/cuda-toolkit-41

There is also a chance that setting the environment variable as outlined in 
this link may help you.
http://forums.nvidia.com/index.php?showtopic=200629

However, I cannot explain why MVAPICH would work and Open MPI would not.  

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 12:08 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] GPUDirect v1 issues
>
>I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.
>
>Attached you find a little test case which is based on the GPUDirect v1 test
>case (mpi_pinned.c).
>In that program the sender splits a message into chunks and sends them
>separately to the receiver which posts the corresponding recvs. It is a kind of
>pipelining.
>
>In mpi_pinned.c:141 the offsets into the recv buffer are set.
>For the correct offsets, i.e. increasing them, it blocks with Open MPI.
>
>Using line 142 instead (offset = 0) works.
>
>The tarball attached contains a Makefile where you will have to adjust
>
>* CUDA_INC_DIR
>* CUDA_LIB_DIR
>
>Sebastian
>
>On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:
>
>> Also, which version of MVAPICH2 did you use?
>>
>> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2)
>> vis MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
>>
>> Ken
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>mpi.org]
>> On Behalf Of Rolf vandeVaart
>> Sent: Tuesday, January 17, 2012 7:54 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] GPUDirect v1 issues
>>
>> I am not aware of any issues.  Can you send me a test program and I
>> can try it out?
>> Which version of CUDA are you using?
>>
>> Rolf
>>
>>> -----Original Message-----
>>> From: devel-boun...@open-mpi.org [mailto:devel-bounces@open-
>mpi.org]
>>> On Behalf Of Sebastian Rinke
>>> Sent: Tuesday, January 17, 2012 8:50 AM
>>> To: Open MPI Developers
>>> Subject: [OMPI devel] GPUDirect v1 issues
>>>
>>> Dear all,
>>>
>>> I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
>>> MPI_SEND/RECV to block forever.
>>>
>>> For two subsequent MPI_RECV, it hangs if the recv buffer pointer of
>>> the second recv points to somewhere, i.e. not at the beginning, in
>>> the recv buffer (previously allocated with cudaMallocHost()).
>>>
>>> I tried the same with MVAPICH2 and did not see the problem.
>>>
>>> Does anybody know about issues with GPUDirect v1 using Open MPI?
>>>
>>> Thanks for your help,
>>> Sebastian
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
I use CUDA 4.0 with MVAPICH2 1.5.1p1 and Open MPI 1.4.2.

Attached you find a little test case which is based on the GPUDirect v1 test 
case (mpi_pinned.c).
In that program the sender splits a message into chunks and sends them 
separately to the receiver
which posts the corresponding recvs. It is a kind of pipelining.

In mpi_pinned.c:141 the offsets into the recv buffer are set.
For the correct offsets, i.e. increasing them, it blocks with Open MPI.

Using line 142 instead (offset = 0) works.

The tarball attached contains a Makefile where you will have to adjust

* CUDA_INC_DIR
* CUDA_LIB_DIR

Sebastian

On Jan 17, 2012, at 4:16 PM, Kenneth A. Lloyd wrote:

> Also, which version of MVAPICH2 did you use?
> 
> I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2) vis
> MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.
> 
> Ken
> -Original Message-
> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
> Behalf Of Rolf vandeVaart
> Sent: Tuesday, January 17, 2012 7:54 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] GPUDirect v1 issues
> 
> I am not aware of any issues.  Can you send me a test program and I can try
> it out?
> Which version of CUDA are you using?
> 
> Rolf
> 
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>> On Behalf Of Sebastian Rinke
>> Sent: Tuesday, January 17, 2012 8:50 AM
>> To: Open MPI Developers
>> Subject: [OMPI devel] GPUDirect v1 issues
>> 
>> Dear all,
>> 
>> I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking 
>> MPI_SEND/RECV to block forever.
>> 
>> For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the 
>> second recv points to somewhere, i.e. not at the beginning, in the recv 
>> buffer (previously allocated with cudaMallocHost()).
>> 
>> I tried the same with MVAPICH2 and did not see the problem.
>> 
>> Does anybody know about issues with GPUDirect v1 using Open MPI?
>> 
>> Thanks for your help,
>> Sebastian
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel


testcase_start_address.tar.gz
Description: GNU Zip compressed data


Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Kenneth A. Lloyd
Also, which version of MVAPICH2 did you use?

I've been pouring over Rolf's OpenMPI CUDA RDMA 3 (using CUDA 4.1 r2) vis
MVAPICH-GPU on a small 3 node cluster. These are wickedly interesting.

Ken
-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On
Behalf Of Rolf vandeVaart
Sent: Tuesday, January 17, 2012 7:54 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] GPUDirect v1 issues

I am not aware of any issues.  Can you send me a test program and I can try
it out?
Which version of CUDA are you using?

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 8:50 AM
>To: Open MPI Developers
>Subject: [OMPI devel] GPUDirect v1 issues
>
>Dear all,
>
>I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking 
>MPI_SEND/RECV to block forever.
>
>For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the 
>second recv points to somewhere, i.e. not at the beginning, in the recv 
>buffer (previously allocated with cudaMallocHost()).
>
>I tried the same with MVAPICH2 and did not see the problem.
>
>Does anybody know about issues with GPUDirect v1 using Open MPI?
>
>Thanks for your help,
>Sebastian
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel

---
This email message is for the sole use of the intended recipient(s) and may
contain confidential information.  Any unauthorized review, use, disclosure
or distribution is prohibited.  If you are not the intended recipient,
please contact the sender by reply email and destroy all copies of the
original message.

---

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1901 / Virus Database: 2109/4747 - Release Date: 01/16/12



Re: [OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Rolf vandeVaart
I am not aware of any issues.  Can you send me a test program and I can try it 
out?
Which version of CUDA are you using?

Rolf

>-Original Message-
>From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org]
>On Behalf Of Sebastian Rinke
>Sent: Tuesday, January 17, 2012 8:50 AM
>To: Open MPI Developers
>Subject: [OMPI devel] GPUDirect v1 issues
>
>Dear all,
>
>I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking
>MPI_SEND/RECV to block forever.
>
>For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the
>second recv points to somewhere, i.e. not at the beginning, in the recv buffer
>(previously allocated with cudaMallocHost()).
>
>I tried the same with MVAPICH2 and did not see the problem.
>
>Does anybody know about issues with GPUDirect v1 using Open MPI?
>
>Thanks for your help,
>Sebastian
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---



[OMPI devel] GPUDirect v1 issues

2012-01-17 Thread Sebastian Rinke
Dear all,

I'm using GPUDirect v1 with Open MPI 1.4.3 and experience blocking 
MPI_SEND/RECV to block forever.

For two subsequent MPI_RECV, it hangs if the recv buffer pointer of the second 
recv points to somewhere, i.e. not at the beginning, 
in the recv buffer (previously allocated with cudaMallocHost()).

I tried the same with MVAPICH2 and did not see the problem.

Does anybody know about issues with GPUDirect v1 using Open MPI?

Thanks for your help,
Sebastian