Forgot to send to the list. ________________________________________ From: Ian Cullinan Sent: Tuesday, 3 February 2009 11:23 AM To: Dan Goodman Subject: RE: [PyCuda] global memory?
I dunno about the allocation (unless you can do the filter in-place, then reshape the array to the right size) I think you're either going to have to do it twice (the first time to find out how much space you need, then again to store the result). But for the actual filtering, since you're expecting very few results you should be able to get good performance by just partitioning the input among lots of threads, storing an index in global memory and using atomicInc whenever you get a match. Cheers, Ian ________________________________________ From: [email protected] [[email protected]] On Behalf Of Dan Goodman [[email protected]] Sent: Tuesday, 3 February 2009 4:41 AM To: [email protected] Subject: [PyCuda] global memory? Hi all, I hope this isn't a stupid question for this list, I've only just started using CUDA programming. What I want to do is implement the numpy operation J=where(x>x0) for a gpu array x and a fixed constant x0. I want J to be computed on the GPU (so that x doesn't have to be copied from the GPU to the CPU) but then to be copied to the CPU. How would I go about doing this? I was thinking about using the global memory space of the GPU basically, and just using a single thread on the GPU to do the thresholding operation. This isn't a very efficient way to use the GPU but I don't see how I can do it in a parallel way. The thresholding operation is performed many, many times with the array x updated (by the GPU) in between, but each individual thresholding operation is only expected to return an array J with a handful of values. For example, x might be an array of 30,000 elements, and J might be say 5-20 elements. So my question is basically, how can I allocate space on the global memory using PyCuda, and then copy from this space. I couldn't decide how to do this from the docs (or even if its possible). Of course if anyone has another idea for a parallel way to do my thresholding operation that would also be great! :-) Thanks in advance for any help, Dan _______________________________________________ PyCuda mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
