Hi all,

I have finally bitten the bullet and have started porting my solver from
CUDA to OpenCL.  During a time-step it is necessary for MPI ranks to
exchange data.  With PyCUDA and mpi4py our application proceeds as follows:

At start-up we allocate a page-locked buffer on the host and an
equally-sized buffer on the device.  We also construct a persistent MPI
request for either sending the host buffer.  Then, when the time is
right, we run a packing kernel on the device, initiate a device-to-host
copy, and then start the persistent MPI request.

Does anyone have any experience with performing this with OpenCL?  From
what I can gather there are a variety of options, although none which
jump off the page.  I am weary of getting the device to use a
memory-mapped host pointer (when I tried it with CUDA our performance
tanked).  I can not also find a direct equivalent to pagelocked_empty in
OpenCL.  ALLOC_HOST_PTR followed by an enqueue_map_buffer may be what I
want but am unsure if it fits in with persistent requests (it would need
to be mapped all of the time).

Regards, Freddie.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to