Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

2015-06-17 Thread Fei Mao
Thanks!

On Jun 17, 2015, at 3:08 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> There is no short-term plan but we are always looking at ways to improve 
> things so this could be looked at some time in the future.
>  
> Rolf
>  
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
> Sent: Wednesday, June 17, 2015 1:48 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5
>  
> Hi Rolf,
>  
> Thank you very much for clarifying the problem. Is there any plan to support 
> GPU RDMA for reduction in the future?
>  
> On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
> 
> 
> Hi Fei:
>  
> The reduction support for CUDA-aware in Open MPI is rather simple.  The GPU 
> buffers are copied into temporary host buffers and then the reduction is done 
> with the host buffers.  At the completion of the host reduction, the data is 
> copied back into the GPU buffers.  So, there is no use of CUDA IPC or GPU 
> Direct RDMA in the reduction.
>  
> Rolf
>  
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
> Sent: Wednesday, June 17, 2015 1:08 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5
>  
> Hi there,
>  
> I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs 
> each node. Two K80 are connected with CPU socket 0, another two with socket 
> 1. An IB ConnectX-3 (FDR) is also under socket 1. We are using Linux’s OFED, 
> so I know there is no way to do GPU RDMA inter-node communication. I can do 
> intra-node IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) 
> which are connected under same socket (PCI-e switch). So I thought I could do 
> intra-node MPI_Reduce with IPC support in openmpi 1.8.5.
>  
> The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
> results when I use two GPU under the same socket or different socket. The 
> result was the same even I used two GPUs in different nodes. 
>  
> Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox 
> OFED stack to support GPU RDMA reduction on GPUs even they are under with the 
> same PCI-e switch?
>  
> Thanks,
>  
> Fei Mao
> High Performance Computing Technical Consultant 
> SHARCNET | http://www.sharcnet.ca
> Compute/Calcul Canada | http://www.computecanada.ca
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27147.php
>  
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27151.php



Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

2015-06-17 Thread Rolf vandeVaart
There is no short-term plan but we are always looking at ways to improve things 
so this could be looked at some time in the future.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
Sent: Wednesday, June 17, 2015 1:48 PM
To: Open MPI Users
Subject: Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Hi Rolf,

Thank you very much for clarifying the problem. Is there any plan to support 
GPU RDMA for reduction in the future?

On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart 
<rvandeva...@nvidia.com<mailto:rvandeva...@nvidia.com>> wrote:


Hi Fei:

The reduction support for CUDA-aware in Open MPI is rather simple.  The GPU 
buffers are copied into temporary host buffers and then the reduction is done 
with the host buffers.  At the completion of the host reduction, the data is 
copied back into the GPU buffers.  So, there is no use of CUDA IPC or GPU 
Direct RDMA in the reduction.

Rolf

From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
Sent: Wednesday, June 17, 2015 1:08 PM
To: us...@open-mpi.org<mailto:us...@open-mpi.org>
Subject: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

Hi there,

I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs each 
node. Two K80 are connected with CPU socket 0, another two with socket 1. An IB 
ConnectX-3 (FDR) is also under socket 1. We are using Linux's OFED, so I know 
there is no way to do GPU RDMA inter-node communication. I can do intra-node 
IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) which are 
connected under same socket (PCI-e switch). So I thought I could do intra-node 
MPI_Reduce with IPC support in openmpi 1.8.5.

The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
results when I use two GPU under the same socket or different socket. The 
result was the same even I used two GPUs in different nodes.

Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox OFED 
stack to support GPU RDMA reduction on GPUs even they are under with the same 
PCI-e switch?

Thanks,

Fei Mao
High Performance Computing Technical Consultant
SHARCNET | http://www.sharcnet.ca<http://www.sharcnet.ca/>
Compute/Calcul Canada | 
http://www.computecanada.ca<http://www.computecanada.ca/>

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/06/27147.php



Re: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

2015-06-17 Thread Fei Mao
Hi Rolf,

Thank you very much for clarifying the problem. Is there any plan to support 
GPU RDMA for reduction in the future?

On Jun 17, 2015, at 1:38 PM, Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> Hi Fei:
>  
> The reduction support for CUDA-aware in Open MPI is rather simple.  The GPU 
> buffers are copied into temporary host buffers and then the reduction is done 
> with the host buffers.  At the completion of the host reduction, the data is 
> copied back into the GPU buffers.  So, there is no use of CUDA IPC or GPU 
> Direct RDMA in the reduction.
>  
> Rolf
>  
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Fei Mao
> Sent: Wednesday, June 17, 2015 1:08 PM
> To: us...@open-mpi.org
> Subject: [OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5
>  
> Hi there,
>  
> I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs 
> each node. Two K80 are connected with CPU socket 0, another two with socket 
> 1. An IB ConnectX-3 (FDR) is also under socket 1. We are using Linux’s OFED, 
> so I know there is no way to do GPU RDMA inter-node communication. I can do 
> intra-node IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) 
> which are connected under same socket (PCI-e switch). So I thought I could do 
> intra-node MPI_Reduce with IPC support in openmpi 1.8.5.
>  
> The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
> results when I use two GPU under the same socket or different socket. The 
> result was the same even I used two GPUs in different nodes. 
>  
> Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox 
> OFED stack to support GPU RDMA reduction on GPUs even they are under with the 
> same PCI-e switch?
>  
> Thanks,
>  
> Fei Mao
> High Performance Computing Technical Consultant 
> SHARCNET | http://www.sharcnet.ca
> Compute/Calcul Canada | http://www.computecanada.ca
> This email message is for the sole use of the intended recipient(s) and may 
> contain confidential information.  Any unauthorized review, use, disclosure 
> or distribution is prohibited.  If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/06/27147.php



[OMPI users] CUDA-aware MPI_Reduce problem in Openmpi 1.8.5

2015-06-17 Thread Fei Mao
Hi there,

I am doing benchmarks on a GPU cluster with two CPU sockets and 4 K80 GPUs each 
node. Two K80 are connected with CPU socket 0, another two with socket 1. An IB 
ConnectX-3 (FDR) is also under socket 1. We are using Linux’s OFED, so I know 
there is no way to do GPU RDMA inter-node communication. I can do intra-node 
IPC for MPI_Send and MPI_Receive with two K80 (4 GPUs in total) which are 
connected under same socket (PCI-e switch). So I thought I could do intra-node 
MPI_Reduce with IPC support in openmpi 1.8.5.

The benchmark I was using is osu-micro-benchmarks-4.4.1, and I got the same 
results when I use two GPU under the same socket or different socket. The 
result was the same even I used two GPUs in different nodes. 

Does MPI_Reduce use IPC for intra-node? Should I have to install Mellanox OFED 
stack to support GPU RDMA reduction on GPUs even they are under with the same 
PCI-e switch?

Thanks,

Fei Mao
High Performance Computing Technical Consultant 
SHARCNET | http://www.sharcnet.ca
Compute/Calcul Canada | http://www.computecanada.ca