Thanks Andreas for the hint. Actually, what I am trying is little bit complex than that. I have two python processes running on two GPUs. In a simpler setting, I have array x in gpu0's Python process to be transferred to gpu1's process and vice versa.
I solved it in this schema: * alloc-host-memory * memcpy from device to host (gpu0 to host; gpu1 to host) * send/receive objects in host memory to the Python process in the other gpu * memcpy from host to device within respective gpu The solution and output from a sample run follow. Now, I wonder if it is possible to improve this further. One possibility is whether the device to host copy can be eliminated. Because, I need to transfer several theano tensors between multiple (up to 4) gpus and I need to do this quite frequently (say every nth mini batch) during training. Note: Not all gpus are P2P capable and so memcpy_peer wouldn't work. import multiprocessing as mp import numpy as np import zmq import time import pycuda import pycuda.driver as drv import pycuda.gpuarray as gpuarray def proc1(): import theano sock = zmq.Context().socket(zmq.PAIR) sock.connect('tcp://localhost:5003') drv.init() ctx = drv.Context.attach() x_gpu = gpuarray.to_gpu(np.random.rand(8)) y_gpu_copy = gpuarray.zeros_like(x_gpu) x_host = drv.pagelocked_zeros_like(x_gpu) drv.memcpy_dtoh_async(x_host, x_gpu.ptr) sock.send_pyobj(x_host) y_host_copy = sock.recv_pyobj() drv.memcpy_htod_async(y_gpu_copy.ptr, y_host_copy) print "Proc-1: value before transfer\n", x_gpu print "Proc-1: value after transfer\n", y_gpu_copy print "Proc-1: sum after transfer\n", x_gpu + y_gpu_copy ctx.detach() def proc2(): import theano sock = zmq.Context().socket(zmq.PAIR) sock.bind('tcp://*:5003') drv.init() ctx = drv.Context.attach() y_gpu = gpuarray.to_gpu(np.random.rand(8) * 0.9) x_gpu_copy = gpuarray.zeros_like(y_gpu) y_host = drv.pagelocked_zeros_like(y_gpu) drv.memcpy_dtoh_async(y_host, y_gpu.ptr) sock.send_pyobj(y_host) x_host_copy = sock.recv_pyobj() drv.memcpy_htod_async(x_gpu_copy.ptr, x_host_copy) time.sleep(10) print "\nProc-2: value before transfer\n", y_gpu print "Proc-2: value after transfer\n", x_gpu_copy print "Proc-2: sum after transfer\n", y_gpu + x_gpu_copy ctx.detach() if __name__ == '__main__': p1 = mp.Process(target=proc1) p2 = mp.Process(target=proc2) p1.start() p2.start() Here is the output from a sample run. As expected, the sum value in both processes are same in the end. [dccxc090] ~/multi-GPUs $ /opt/share/Python-2.7.9/bin/python multi_pycuda_d2d_demo.py Using gpu device 0: Tesla K40m (CNMeM is disabled) Using gpu device 1: Tesla K40m (CNMeM is disabled) Proc-1: value before transfer [ 0.64424104 0.98413032 0.46654151 0.40943486 0.6895878 0.81006672 0.00907435 0.88727554] Proc-1: value after transfer [ 0.57981693 0.88571729 0.41988736 0.36849138 0.62062902 0.72906005 0.00816691 0.79854798] Proc-1: sum after transfer [ 1.22405797 1.86984761 0.88642887 0.77792624 1.31021682 1.53912676 0.01724126 1.68582352] Proc-2: value before transfer [ 0.57981693 0.88571729 0.41988736 0.36849138 0.62062902 0.72906005 0.00816691 0.79854798] Proc-2: value after transfer [ 0.64424104 0.98413032 0.46654151 0.40943486 0.6895878 0.81006672 0.00907435 0.88727554] Proc-2: sum after transfer [ 1.22405797 1.86984761 0.88642887 0.77792624 1.31021682 1.53912676 0.01724126 1.68582352] - Baskaran On Wed, Nov 11, 2015 at 2:40 AM, Andreas Kloeckner <li...@informa.tiker.net> wrote: > Baskaran Sankaran <baskar...@gmail.com> writes: > > > Hi all, > > > > I am looking for a solution for exchanging some tensors between two gpus, > > that do not have P2P enabled. Assuming two GPUs on the same node, I > guess I > > have to do it in two steps; first copy to host memory from GPU (gpu-0) > and > > then copy from host memory to the other GPU (gpu-1). However it is not > > exactly clear to me as to how I can go about this. > > (1) Allocate memory on host > (2) memcpy(host mem, gpu0_mem) > (3) memcpy(gpu1_mem, host_mem) > (4) (Optionally) free host mem > > Not sure what you're asking... > > Andreas >
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda