On Sat, 28 Jan 2012 09:51:20 +1100, Bogdan Opanchuk <manti...@gmail.com> wrote: > Hi Andreas, > > On Sat, Jan 28, 2012 at 3:23 AM, Andreas Kloeckner > <li...@informa.tiker.net> wrote: > > Indeed, inserting __syncthreads() after the > > shared array declaration brings the error down to more reasonable values > > for me. Jesse, my recommendation would be to use that as a workaround > > while we figure out a more permanent fix. > > Can't we do this: > >> 1. Using "extern __shared__ out_type sdata[]" and setting the size of > >> shared memory when preparing the kernel. > We can pass dtype instead of ctype to > get_reduction_kernel_and_types(), and convert it to ctype + data size > inside.
This issue has 'ick' written all over it. Unfortunately, out_type sdata[] doesn't appear to quite cut it, as Thrust uses this bit of code here: http://code.google.com/p/thrust/source/browse/thrust/system/cuda/detail/extern_shared_ptr.h which casts form int4 and apparently serves to ensure alignment. (Nathan pointed me to this.) I'm not sure why the native type wouldn't quite be correctly aligned, so I guess I'm not fully understanding... Andreas
pgpPx5MyZG3oB.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda