Hi Bogdan, On Fri, 27 Jan 2012 13:27:04 +1100, Bogdan Opanchuk <manti...@gmail.com> wrote: > Hello, > > As far as I understand, this is somehow connected with pycuda::complex > having default constructor. When you write (in reduction kernel): > > __shared__ out_type sdata[BLOCK_SIZE]; > > and "out_type" has the default constructor, it is called from _every_ > thread for _every_ value of the array. The reduction kernel starts > like: > > __shared__ out_type sdata[BLOCK_SIZE]; > ... collecting values ... > sdata[tid] = <some value> > > which means that there is a race between the default constructor and > actual initialization. I am not sure why nvcc does not complain about > this (bug?), but the following things do help: > 1. Using "extern __shared__ out_type sdata[]" and set the size of > shared memory when preparing the kernel. > or > 2. Putting "__syncthreads()" between default initialization and actual > initialization (not very good, since it leaves all those calls to > default constructor, but still removes the symptom). > > I googled a bit, but could not find any actual rules about default > constructors and shared memory.
Thanks for this insight! I don't think this would've occurred to me just From staring at the code. :) Indeed, inserting __syncthreads() after the shared array declaration brings the error down to more reasonable values for me. Jesse, my recommendation would be to use that as a workaround while we figure out a more permanent fix. I just searched as well, but couldn't find anything. I've pinged Nathan Bell at Nvidia (coauthor of thrust) to see what he thinks about this. Andreas
pgpcGyqZOtyWQ.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda