On Thu, Aug 25, 2011 at 10:33 PM, Francis <fccaba...@gmail.com> wrote: > I could make use of the tens of thousands of threads in > CUDA to get the length of each substring/subarray.
The python list structure stores the length of the list already (it increments / decrements with appends / pops, etc.), so you'd be *re*computing a value that you already have. > In this 2nd case ( array > of ints or floats or chars) it would be a good idea to move data to CUDA ( > for perhaps n > = 1000, n is the input size ) no? Even though it is still O( > n ) in the host. I think that it would be best at this point for you to implement both and profile the two implementations to compare runtimes. My suggestion would be to implement the python-side wrangling first, and time that vs. my <10 line algo above (I suspect that just the wrangling will be slower than my solution, much less any call to CUDA), then add in the CUDA code after that if it still seems like it's going to be a performance win. Cheers, Eli _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda