Am 2015-08-17 12:57, schrieb Eric Hunsberger:
Does anyone know if concurrent kernels work on (newer) NVIDIA devices
in OpenCL? If so, can anyone provide some PyOpenCL code that runs a
minimal working example? As well as perhaps the driver version you're
using?

For context, "concurrent kernels" just means multiple kernels running
at the same time. For example, if I have a bunch of kernels, each of
which only takes up 32 work groups, and my device has a max work group
size of 1024, then I should ideally be able to run 32 such kernels at
the same time (in parallel). From what I've read, earlier NVIDIA GPUs
didn't support this; they added support for up to 16 concurrent
kernels with the Fermi architecture.

There was a lot of discussion about concurrent kernels four years ago
or so, based on the threads I've found, and at the time it wasn't
clear if NVIDIA's OpenCL drivers supported this or not. I still can't
find a conclusive answer as to whether it should work, and I can't get
it working in my own code. I've seen several places that multiple
queues are needed to do this, and even heard that it's necessary to
flush all the queues, but I still can't get anything to work. NVIDIA
devices can do this using CUDA:
http://wiki.tiker.net/PyCuda/Examples/KernelConcurrency [1].

What version of the nvidia driver are using to try this? How do you judge whether what you are trying is working or not? Can you share some code that people can try on their own machines?

My naive perception is that you should just create multiple queues and submit kernels to them, and things should just work. What happens if you try and do that?

Andreas

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to