Hi all,
I hope this isn't a stupid question for this list, I've only just
started using CUDA programming.
What I want to do is implement the numpy operation J=where(x>x0) for a
gpu array x and a fixed constant x0. I want J to be computed on the GPU
(so that x doesn't have to be copied from the GPU to the CPU) but then
to be copied to the CPU. How would I go about doing this?
I was thinking about using the global memory space of the GPU basically,
and just using a single thread on the GPU to do the thresholding
operation. This isn't a very efficient way to use the GPU but I don't
see how I can do it in a parallel way. The thresholding operation is
performed many, many times with the array x updated (by the GPU) in
between, but each individual thresholding operation is only expected to
return an array J with a handful of values. For example, x might be an
array of 30,000 elements, and J might be say 5-20 elements.
So my question is basically, how can I allocate space on the global
memory using PyCuda, and then copy from this space. I couldn't decide
how to do this from the docs (or even if its possible).
Of course if anyone has another idea for a parallel way to do my
thresholding operation that would also be great! :-)
Thanks in advance for any help,
Dan
_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net