Received from Baskaran Sankaran on Tue, Nov 17, 2015 at 03:08:10PM EST:
> @Lev, thanks for the tip; I will look into it.
> In the meanwhile, I am running into some speed issues. I notice that it
> slows down progressively almost by a factor of 0.5, in just 7000 updates.
> It starts with about 2.6 sec/ mini-batch (average speed), but after 7000
> mini-batches, the time increases to 3.7 secs/ mini-batch.
> I suspect that I may not be sending the host memory pointers but the actual
> arrays, serialized by zmq's send_pyobj (see below in the code). Could
> someone confirm whether I am doing it correctly? Should I just be sending/
> receiving host memory pointers?

You are transmitting the array contents. If you use IPC to send the GPU array
pointers to both processes [1], you should be able to perform a device-to-device
copy between the two memory locations even if you can't use P2P [2] (assuming
that UVA is supported on both devices).

Lev Givon
Bionet Group | Neurokernel Project

PyCUDA mailing list

Reply via email to