Does anyone know if concurrent kernels work on (newer) NVIDIA devices in OpenCL? If so, can anyone provide some PyOpenCL code that runs a minimal working example? As well as perhaps the driver version you're using?
For context, "concurrent kernels" just means multiple kernels running at the same time. For example, if I have a bunch of kernels, each of which only takes up 32 work groups, and my device has a max work group size of 1024, then I should ideally be able to run 32 such kernels at the same time (in parallel). From what I've read, earlier NVIDIA GPUs didn't support this; they added support for up to 16 concurrent kernels with the Fermi architecture. There was a lot of discussion about concurrent kernels four years ago or so, based on the threads I've found, and at the time it wasn't clear if NVIDIA's OpenCL drivers supported this or not. I still can't find a conclusive answer as to whether it should work, and I can't get it working in my own code. I've seen several places that multiple queues are needed to do this, and even heard that it's necessary to flush all the queues, but I still can't get anything to work. NVIDIA devices can do this using CUDA: http://wiki.tiker.net/PyCuda/Examples/KernelConcurrency. Cheers, Eric
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
