On Sat, 28 Jan 2012 09:51:20 +1100, Bogdan Opanchuk <manti...@gmail.com> wrote:
> Hi Andreas,
> 
> On Sat, Jan 28, 2012 at 3:23 AM, Andreas Kloeckner
> <li...@informa.tiker.net> wrote:
> > Indeed, inserting __syncthreads() after the
> > shared array declaration brings the error down to more reasonable values
> > for me. Jesse, my recommendation would be to use that as a workaround
> > while we figure out a more permanent fix.
> 
> Can't we do this:
> >> 1. Using "extern __shared__ out_type sdata[]" and setting the size of
> >> shared memory when preparing the kernel.
> We can pass dtype instead of ctype to
> get_reduction_kernel_and_types(), and convert it to ctype + data size
> inside.

This issue has 'ick' written all over it. Unfortunately, 

out_type sdata[]

doesn't appear to quite cut it, as Thrust uses this bit of code here:

http://code.google.com/p/thrust/source/browse/thrust/system/cuda/detail/extern_shared_ptr.h

which casts form int4 and apparently serves to ensure alignment. (Nathan
pointed me to this.) I'm not sure why the native type wouldn't quite be
correctly aligned, so I guess I'm not fully understanding...

Andreas

Attachment: pgpPx5MyZG3oB.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to