No UVA is not enabled on them; I already tried memcpy_dtod. In send_pyobj the array would just be passed as reference though right?
Thanks On Tuesday, November 17, 2015, Lev Givon <l...@columbia.edu> wrote: > Received from Baskaran Sankaran on Tue, Nov 17, 2015 at 03:08:10PM EST: > > @Lev, thanks for the tip; I will look into it. > > > > In the meanwhile, I am running into some speed issues. I notice that it > > slows down progressively almost by a factor of 0.5, in just 7000 updates. > > It starts with about 2.6 sec/ mini-batch (average speed), but after 7000 > > mini-batches, the time increases to 3.7 secs/ mini-batch. > > > > I suspect that I may not be sending the host memory pointers but the > actual > > arrays, serialized by zmq's send_pyobj (see below in the code). Could > > someone confirm whether I am doing it correctly? Should I just be > sending/ > > receiving host memory pointers? > > You are transmitting the array contents. If you use IPC to send the GPU > array > pointers to both processes [1], you should be able to perform a > device-to-device > copy between the two memory locations even if you can't use P2P [2] > (assuming > that UVA is supported on both devices). > > [1] https://gist.github.com/e554b3985e196b07f93b > [2] https://gist.github.com/3078644 > -- > Lev Givon > Bionet Group | Neurokernel Project > http://lebedov.github.io/ > http://neurokernel.github.io/ > >
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda