Received from Baskaran Sankaran on Tue, Nov 17, 2015 at 03:08:10PM EST:
> @Lev, thanks for the tip; I will look into it.
> 
> In the meanwhile, I am running into some speed issues. I notice that it
> slows down progressively almost by a factor of 0.5, in just 7000 updates.
> It starts with about 2.6 sec/ mini-batch (average speed), but after 7000
> mini-batches, the time increases to 3.7 secs/ mini-batch.
> 
> I suspect that I may not be sending the host memory pointers but the actual
> arrays, serialized by zmq's send_pyobj (see below in the code). Could
> someone confirm whether I am doing it correctly? Should I just be sending/
> receiving host memory pointers?

You are transmitting the array contents. If you use IPC to send the GPU array
pointers to both processes [1], you should be able to perform a device-to-device
copy between the two memory locations even if you can't use P2P [2] (assuming
that UVA is supported on both devices).

[1] https://gist.github.com/e554b3985e196b07f93b
[2] https://gist.github.com/3078644
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://lebedov.github.io/
http://neurokernel.github.io/


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to