On Fri, Mar 22, 2013 at 11:01:56AM +0200, Pekka Jääskeläinen wrote:
> I'm more optimistic than this. CUDA and OpenCL are so similar that
> code gen bugs in the latter should be reproducible with CUDA kernels.
> I will help reporting the possible bugs after the driver lands to pocl
> trunk as I have an (older) NVIDIA card to test with.
It is good to hear that you are optimistic about this.
I was curious to see how the NVIDIA OpenCL driver handles __local and
__constant kernel parameters. The result is surprising.
I compiled this OpenCL source code using libcuda.so (4.2.1):
__kernel void incr(__constant const uint *restrict a, __global uint *restrict
b, __local uint *l_a)
{
const uint gid = get_global_id(0);
l_a[gid] = a[gid] + 1;
b[gid] = l_a[gid];
}
clProgramGetInfo() returns the following PTX code:
//
// Generated by NVIDIA NVVM Compiler
// Compiler built on Thu Aug 2 23:31:19 2012 (1343964679)
// Driver 295.71
//
.version 3.0
.target sm_20, texmode_independent
.address_size 32
.entry incr(
.param .u32 .ptr .const .align 4 incr_param_0,
.param .u32 .ptr .global .align 4 incr_param_1,
.param .u32 .ptr .shared .align 4 incr_param_2
)
{
...
So the NVIDIA GPU driver must support .global and .shared parameters.
However, when I load the above PTX code using the CUDA driver API's
cuModuleLoad(), it fails with CUDA error 200 (invalid kernel image).
Do you have any idea what is going on here?
Thanks,
Peter
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel