Re: [pocl-devel] NVIDIA device backend for POCL

Pekka Jääskeläinen Mon, 18 Mar 2013 13:07:45 -0700

Hello Peter,

On 03/18/2013 07:22 PM, Peter Colberg wrote:
> In case you are familiar with the OpenCL support of the NVIDIA driver,
> it has not been getting any better with the recent CUDA 5.0 release.
> As I had to experience, their OpenCL implementation seems to be
> serial, which prevents multi-GPU scheduling, or kernel execution and
> memory transfer overlapping, when using a single host thread. I fear
> the worst case, that OpenCL support would eventually be dropped
> entirely.


I've heard about this. It's unfortunate, but understandable
business-wise for the leading player in the GPU compute game.

Of course that's bad news for the end users who could really
benefit from a wide-spread and well-supported open heterogeneous
compute API without vendor lock-in. Let's hope we can improve
the situation with pocl.

> A PTX device backend to POCL would allow users to develop portable
> OpenCL (1.2) codes now, and run them on existing installations of
> NVIDIA GPUs using the CUDA driver, until we have mature open-source
> support for AMD and NVIDIA GPUs in distributions.
>
> What do you think about this idea? Does the CUDA driver API expose
> enough functionality to implement OpenCL? How difficult would it
> be to use the NVPTX backend of LLVM for compilation?

It should be doable with the CUDA API and the LLVM NVPTX backend.
I took a look at the CUDA API some time ago with this exact idea in
mind, but didn't have the time to move forward with it.

How much work it is, I'm not sure, as I haven't tested the LLVM NVPTX
backend nor the API. But my guess is it shouldn't be too hard to get
something running because we have the previous drivers for
heterogeneous device setups as examples.

If you are up for the task, take a look at the pocl device
drivers for cellspu, TCE (ttasim), or the Tom Stellard's
unfinished Gallium compute / AMD R600 driver.

A key point is that the kernel compilation chain is simpler than for
CPUs as the NVIDIA GPUs are SIMT. Thus, the kernel compiler should
skip most of the complex passes and feed the single
work-item kernel to the device. For R600 it's similar,
but in that case it can sometimes benefit from multi work-item
(replicated) input due to the ILP in the VLIW lanes.

-- 
--Pekka

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] NVIDIA device backend for POCL

Reply via email to