Hi Bogdan,

On Fri, 27 Jan 2012 13:27:04 +1100, Bogdan Opanchuk <manti...@gmail.com> wrote:
> Hello,
> 
> As far as I understand, this is somehow connected with pycuda::complex
> having default constructor. When you write (in reduction kernel):
> 
> __shared__ out_type sdata[BLOCK_SIZE];
> 
> and "out_type" has the default constructor, it is called from _every_
> thread for _every_ value of the array. The reduction kernel starts
> like:
> 
> __shared__ out_type sdata[BLOCK_SIZE];
> ... collecting values ...
> sdata[tid] = <some value>
> 
> which means that there is a race between the default constructor and
> actual initialization. I am not sure why nvcc does not complain about
> this (bug?), but the following things do help:
> 1. Using "extern __shared__ out_type sdata[]" and set the size of
> shared memory when preparing the kernel.
> or
> 2. Putting "__syncthreads()" between default initialization and actual
> initialization (not very good, since it leaves all those calls to
> default constructor, but still removes the symptom).
> 
> I googled a bit, but could not find any actual rules about default
> constructors and shared memory.

Thanks for this insight! I don't think this would've occurred to me just
From staring at the code. :) Indeed, inserting __syncthreads() after the
shared array declaration brings the error down to more reasonable values
for me. Jesse, my recommendation would be to use that as a workaround
while we figure out a more permanent fix.

I just searched as well, but couldn't find anything. I've pinged Nathan
Bell at Nvidia (coauthor of thrust) to see what he thinks about this.

Andreas

Attachment: pgpcGyqZOtyWQ.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to