On 03/21/2013 01:50 AM, Peter Colberg wrote:
> I succeeded to execute a kernel on an NVIDIA Fermi GPU, containing
> get_global_id() and an array copy from source to destination buffer.
Good job! That didn't take long. Shall we try to get it merged
to pocl trunk so there's something to build on? Similar to cellspu which
is experimental, but serves as a starting point.
Can you clean it up a bit, push your branch to launchpad,
and send a merge request?
Comments for the patch:
* configure.ac detection needed for enabling the cuda driver (the cuda library
or
a configure switch?)
* style issues: some overlong lines
* _kernel.h the address space ids: I think we could include _types.h
before _kernel.h and it could be able to override the address space ids and
keep those ones as default. This could work until a better solution is found.
* +# Copyright (c) 2013 <placeholder>
Please add your name there?
* +/* cuda.h - a pocl device driver for Cell SPU.
Copy-paste issue.
* + * this is the SPU local address where 'OpenCL global' memory starts.
Same.
* +get_group_id.bc: get_group_id.ll
+ @LLVM_AS@ -o $@ $<
There's now a mechanism to modify the default list of sources with a
couple of variables in the build files. See
the x86_64 Makefile.am where it overrides the list (remove some default ones and
adds some .cc ones) when vecmathlib is used.
* Can you add some "smoke tests" for the nvptx under 'tests' (similarly as tce
or cellspu) so we can quickly test we didn't completely break during development
nvptx? I have an older NVIDIA card which I can test it with.
> Anything more complex will crash the LLVM (3.2) compiler though, which
> is probably due to both missing specialized implementations of the CL
> library for nvptx, or missing features in the LLVM NVPTX backend.
LLVM 3.3 (trunk) should work at the moment with pocl too.
> The work by Tom Stellard on a gallium driver mentioned on the
> wiki [1] was very helpful in getting started with a GPU device.
>
> The OpenCL run-time part seems doable, but the OpenCL compiler part
> should be implemented by an expert, not by a chemical physicist… ;-)
Let's hope the NVPTX backend of LLVM has improved in 3.3. If not,
probably reporting the issues in the llvm mailing lists should help.
For the pocl's kernel/work-group transformations, like I wrote previously,
they should be skipped for the most parts. Seems you got it covered in
your patch:
+ @OPT@ -internalize-public-api-list=${kernel} -internalize
-inline-threshold=1000000000 -inline \
+ -globaldce \
+ -o ${output_file} ${linked_bc}
+ ;;
Here, instead of the huge inline threshold you could try to add
-flatten -always-inline which should aggressively inline everything.
Clang's include/clang/Basic/BuiltinsNVPTX.def should help in implementing
the built-in implementations. In your case, for example, the .ll implementation
might not be needed if you can use the built-in functions from .cl or .c
code.
I think we need to use the ptx_device calling convention for the kernel
library functions. I wonder if that's default or how to add it? Via
some __attribute__ maybe?
--
--Pekka
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel