Re: [pocl-devel] NVIDIA device backend for POCL

Pekka Jääskeläinen Thu, 21 Mar 2013 11:56:03 -0700

On 03/21/2013 07:19 PM, Peter Colberg wrote:
> * Error handling in device backends
>
> The GPU devices need a way to gracefully return after an error, since
> any device operation may fail. Maybe each device function could return
> an error code to the frontend using a return value (for void functions)
> or an extra pointer parameter (for non-void functions)?


Yes. Seems a vast majority has a void return value now.
The build_program() hook already has an int error code return value.
I'm not sure if it's necessary to return an error in get_timer_value()
which is used to implement the profiling queue?

> * Build from source to binary in clBuildProgram
>
> Part of the kernel build is currently handled upon first call of a
> device's run function. What would be needed to have the complete
> build finished after clBuildProgram returns?

It cannot be done in general because the work group dimensions are known
only at the kernel launch, and they affect the compilation for
non SIMT-targets.

clBuildProgram builds a single work-item kernel using clang that is
sufficient for SIMT-targets like NVIDIA GPUs that spawn on HW, but
for others we need to create the whole work-group function that
executes all the WIs, and for that we need the local size. Well,
strictly put, we could now generate variable iteration count
work-item loops too so this is not exactly true anymore.

> For the cuda device, pocl_cuda_build_program() would transform the
> LLVM bitcode to PTX assembly, and transform the PTX assembly to
> native GPU code using cuModuleLoad. Both steps may fail with an
> error, which is returned to the host from clBuildProgram.

You might be able to implement this with the optional build_program()
device hook.

See tce_common.cc for an example implementation. What it does is execute
a special header-generator program in TCE for generating the special
instruction wrappers. It then launches the default compilation command
with a forced include of that header.

You could override this to do the rest of the steps in addition to
the bitcode generation?

> * Kernel function
>
> For the cuda device, clCreateKernel(s) should trigger the call(s) to
> cuModuleGetFunction, and return an error if the function name is
> invalid.
>
> To implement clCreateKernelsInProgram, is it possible to use the
> output of scripts/pocl-kernel to get a list of function names?

That's a possibility.

However, the Desired Way is to get rid of the helper scripts altogether,
and call libclang/libllvm APIs directly from the C code.

This work was started by Kalle but hasn't received attention lately:
https://code.launchpad.net/~kraiskil/pocl/api

> * Map buffer flags
>
> pocl_cuda_map_mem and pocl_cuda_unmap_mem need to access the mapping
> flags (READ|WRITE), to decide whether the buffer needs to be copied
> between device and host after allocation / before freeing.

OK. You can add extra parameters to that driver functions.

-- 
--Pekka


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] NVIDIA device backend for POCL

Reply via email to