On Sat, Mar 23, 2013 at 12:57:47PM +0200, Pekka Jääskeläinen wrote: > Thus, maybe we need to collect all the local allocations to a single pointer > and allocate it once and in the kernel reassign the variables to point parts > of this region. Should not be a difficult addition to the LLVM pass we already > use for processing the automatic locals.
If this transformation needs to be done at launch time anyway, we could convert __local and __constant kernel arguments to automatic arrays. This means that the LLVM IR -> NVPTX compilation should remain in pocl_cuda_run(). Could you give some pointers for an LLVM newbie how to best achieve these transformations? I would also need to substitute work_dim and global_offset in the LLVM IR, if get_work_dim() or get_global_offset() are used in the OpenCL code. Peter ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
