On Thu, Aug 25, 2011 at 10:33 PM, Francis <fccaba...@gmail.com> wrote:
> I could make use of the tens of thousands of threads in
> CUDA to get the length of each substring/subarray.

The python list structure stores the length of the list already (it
increments / decrements with appends / pops, etc.), so you'd be
*re*computing a value that you already have.

> In this 2nd case ( array
> of ints or floats or chars) it would be a good idea to move data to CUDA (
> for perhaps n > = 1000, n is the input size ) no? Even though it is still O(
> n ) in the host.

I think that it would be best at this point for you to implement both
and profile the two implementations to compare runtimes.  My
suggestion would be to implement the python-side wrangling first, and
time that vs. my <10 line algo above (I suspect that just the
wrangling will be slower than my solution, much less any call to
CUDA), then add in the CUDA code after that if it still seems like
it's going to be a performance win.

Cheers,
Eli

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to