Hi all,

I hope this isn't a stupid question for this list, I've only just started using CUDA programming.

What I want to do is implement the numpy operation J=where(x>x0) for a gpu array x and a fixed constant x0. I want J to be computed on the GPU (so that x doesn't have to be copied from the GPU to the CPU) but then to be copied to the CPU. How would I go about doing this?

I was thinking about using the global memory space of the GPU basically, and just using a single thread on the GPU to do the thresholding operation. This isn't a very efficient way to use the GPU but I don't see how I can do it in a parallel way. The thresholding operation is performed many, many times with the array x updated (by the GPU) in between, but each individual thresholding operation is only expected to return an array J with a handful of values. For example, x might be an array of 30,000 elements, and J might be say 5-20 elements.

So my question is basically, how can I allocate space on the global memory using PyCuda, and then copy from this space. I couldn't decide how to do this from the docs (or even if its possible).

Of course if anyone has another idea for a parallel way to do my thresholding operation that would also be great! :-)

Thanks in advance for any help,
Dan

_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net

Reply via email to