On Tue, 29 May 2012 18:16:52 -0400, Thomas Wiecki <thomas_wie...@brown.edu> wrote: > Hi, > > I saw a couple of times the following idiom being used: > > const int tidx = blockIdx.x*blockDim.x + threadIdx.x; > const int delta = blockDim.x*gridDim.x; > > curandState local_state = global_state[tidx]; > > for (int idx = tidx; idx < n; idx += delta) > { > out[idx] = compute_sth(in[idx]) > } > > I'm not sure I 100% understand what's going on but it is looping over > parts of the array spread dt apart. I think however in the case there > are enough threads available (n < max_threads) only one thread would > be doing all the work -- is that correct? > > Wouldn't a better idiom do sth along the lines of: > > for (int idx = tidx; idx < n; idx += max_threads) > > thus if n < max_threads it would loop only once per thread and scale > up seamlessly. Am I missing something?
These two look exactly the same to me, except you called "delta" "max_threads". I'm really squinting hard, I can't find a difference... Andreas
pgpvSjhyC779a.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda