Hi all, I have finally bitten the bullet and have started porting my solver from CUDA to OpenCL. During a time-step it is necessary for MPI ranks to exchange data. With PyCUDA and mpi4py our application proceeds as follows:
At start-up we allocate a page-locked buffer on the host and an equally-sized buffer on the device. We also construct a persistent MPI request for either sending the host buffer. Then, when the time is right, we run a packing kernel on the device, initiate a device-to-host copy, and then start the persistent MPI request. Does anyone have any experience with performing this with OpenCL? From what I can gather there are a variety of options, although none which jump off the page. I am weary of getting the device to use a memory-mapped host pointer (when I tried it with CUDA our performance tanked). I can not also find a direct equivalent to pagelocked_empty in OpenCL. ALLOC_HOST_PTR followed by an enqueue_map_buffer may be what I want but am unsure if it fits in with persistent requests (it would need to be mapped all of the time). Regards, Freddie.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
