Thanks for the replies @Eli and @David. I suppose given a 'small' enough
list of sub-lists doing what I need to do in the host is good enough instead
of moving and doing the task in the device. I am just looking out for
possible scenarios where the list of sub-lists will be 'large' enough that
moving and doing the task in the GPU using PyCUDA will be much faster.
If indeed I do go ahead with the for loop as Eli and I similarly thought
about that would have a running time of O( n ) where n is the number of
sub-lists. Since I have lots of threads I can put to use I was thinking I'd
rather do the O( n ) task in the device thus making it have a constant
running time relatively speaking. Of course that's an ideal case and doesn't
consider the device-host copy delays that much.


Best regards,

./francis
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to