On Tue, 29 May 2012 18:16:52 -0400, Thomas Wiecki <thomas_wie...@brown.edu> 
wrote:
> Hi,
> 
> I saw a couple of times the following idiom being used:
> 
>         const int tidx = blockIdx.x*blockDim.x + threadIdx.x;
>         const int delta = blockDim.x*gridDim.x;
> 
>         curandState local_state = global_state[tidx];
> 
>         for (int idx = tidx; idx < n; idx += delta)
>         {
>              out[idx] = compute_sth(in[idx])
>         }
> 
> I'm not sure I 100% understand what's going on but it is looping over
> parts of the array spread dt apart. I think however in the case there
> are enough threads available (n < max_threads) only one thread would
> be doing all the work -- is that correct?
> 
> Wouldn't a better idiom do sth along the lines of:
> 
> for (int idx = tidx; idx < n; idx += max_threads)
> 
> thus if n < max_threads it would loop only once per thread and scale
> up seamlessly. Am I missing something?

These two look exactly the same to me, except you called "delta"
"max_threads". I'm really squinting hard, I can't find a difference...

Andreas

Attachment: pgpvSjhyC779a.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to