Does anyone know if concurrent kernels work on (newer) NVIDIA devices in
OpenCL? If so, can anyone provide some PyOpenCL code that runs a minimal
working example? As well as perhaps the driver version you're using?

For context, "concurrent kernels" just means multiple kernels running at
the same time. For example, if I have a bunch of kernels, each of which
only takes up 32 work groups, and my device has a max work group size of
1024, then I should ideally be able to run 32 such kernels at the same time
(in parallel). From what I've read, earlier NVIDIA GPUs didn't support
this; they added support for up to 16 concurrent kernels with the Fermi
architecture.

There was a lot of discussion about concurrent kernels four years ago or
so, based on the threads I've found, and at the time it wasn't clear if
NVIDIA's OpenCL drivers supported this or not. I still can't find a
conclusive answer as to whether it should work, and I can't get it working
in my own code. I've seen several places that multiple queues are needed to
do this, and even heard that it's necessary to flush all the queues, but I
still can't get anything to work. NVIDIA devices can do this using CUDA:
http://wiki.tiker.net/PyCuda/Examples/KernelConcurrency.

Cheers,
Eric
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to