Re: [PyOpenCL] Concurrent kernels on NVIDIA

Andreas Kloeckner Mon, 17 Aug 2015 11:09:05 -0700

Am 2015-08-17 12:57, schrieb Eric Hunsberger:

Does anyone know if concurrent kernels work on (newer) NVIDIA devices
in OpenCL? If so, can anyone provide some PyOpenCL code that runs a
minimal working example? As well as perhaps the driver version you're
using?


For context, "concurrent kernels" just means multiple kernels running
at the same time. For example, if I have a bunch of kernels, each of
which only takes up 32 work groups, and my device has a max work group
size of 1024, then I should ideally be able to run 32 such kernels at
the same time (in parallel). From what I've read, earlier NVIDIA GPUs
didn't support this; they added support for up to 16 concurrent
kernels with the Fermi architecture.

There was a lot of discussion about concurrent kernels four years ago
or so, based on the threads I've found, and at the time it wasn't
clear if NVIDIA's OpenCL drivers supported this or not. I still can't
find a conclusive answer as to whether it should work, and I can't get
it working in my own code. I've seen several places that multiple
queues are needed to do this, and even heard that it's necessary to
flush all the queues, but I still can't get anything to work. NVIDIA
devices can do this using CUDA:
http://wiki.tiker.net/PyCuda/Examples/KernelConcurrency [1].

What version of the nvidia driver are using to try this? How do youjudge whether what you are trying is working or not? Can you share somecode that people can try on their own machines?

My naive perception is that you should just create multiple queues andsubmit kernels to them, and things should just work. What happens if youtry and do that?


Andreas

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] Concurrent kernels on NVIDIA

Reply via email to