You could make your shared C++ code use __device__ functions instead:
template<class T> __device__ void cuda_kernel(...) {}

template<class T> __global__ void cuda_kernel_launch(...) {
    cuda_kernel(...);
}

extern "C" {
__global__ void external_cuda_kernel(...) {
    cuda_kernel(...);
}
}


- bryan

On Tue, Oct 9, 2012 at 10:15 AM, Ahmed Fasih <[email protected]> wrote:

> Andreas, thanks for explaining, and sorry for making further demands
> on your time with this related question. Is there some way allowing me
> to keep my templates as-is (to share code with C++ versions), e.g., my
> template cuda_kernel():
>
> template <class T> __global__ void cuda_kernel(T arg1, T arg2) {
>     ...
> }
>
> and then have my wrapper be:
>
> extern "C" {
> __global__ void external_cuda_kernel(float arg1, float arg2) {
> cuda_kernel<float>(arg1, arg2); }
> }
>
> This wouldn't work because I can't have a __global__ function invoke
> another __global__ function like this on my C2050. I guess I am asking
> if your preferred solution for me would let me get away with both (i)
> not changing my templates and (ii) not replicating the templated code.
>
> Thanks, let me know if I'm being unclear,
> Ahmed
>
>
> On Tue, Oct 9, 2012 at 1:35 AM, Andreas Kloeckner
> <[email protected]> wrote:
> > Ahmed Fasih <[email protected]> writes:
> >
> >> Hi, I am looking at the PyCUDA example for C++ templated kernel calls
> >> [1], but I have the __global__ kernel function with a template, i.e.,
> >>
> >> template <class T> __global__ void cuda_kernel(...)
> >>
> >> From C++, this is invoked via, e.g.,
> >>
> >> cuda_kernel<float><<< grid, threads >>>(...)
> >>
> >> I admit that the theory and PyCUDA practice of name mangling is fuzzy in
> >> my head, so you'll forgive me if I am doing something laughable, but I
> >> tried, in Python,
> >>
> >> module = SourceModule(custring, no_extern_c=1)
> >> cuda_kernel  = module.get_function("cuda_kernel")
> >>
> >> where "custring" contains the CUDA kernel code, and PyCUDA spits out an
> >> error,
> >>
> >> LogicError: cuModuleGetFunction failed: not found
> >>
> >> How can I get my host-callable C++-template kernel functions into
> >> PyCUDA, aside from the simple-minded workaround of untemplating
> >> everything?
> >
> > Solution I'd prefer: Keep your templates, but make 'extern "C"' wrappers
> > with well-defined names for those template kernel that you actually want
> > to call.
> >
> > Otherwise, you'll need one of these:
> >
> >
> http://stackoverflow.com/questions/6526500/c-name-mangling-library-for-python
> >
> > Andreas
>
>
>
> --
> --
> Ahmed Fasih
> [email protected]
> [email protected]
> 614 547 3323 (Google Voice)
>
> _______________________________________________
> PyCUDA mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pycuda
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to