Hi Andrea, Please send the full working script which anyone can save and execute without assembling it from the excerpts you provided. In the mean time, that's what I can say by looking at the kernel:
On Wed, Jul 11, 2012 at 1:24 AM, Andrea Cesari <[email protected]> wrote: > __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h) > > { > int j,step1=kernelSize_h[0]/2; > int idx = threadIdx.x+step1; > for(j=0;j<step1;j++) > corrGpu[idx-step1]+=aMod[idx+j-(step1)]*b[j]; > > } With the construction like "aMod[idx+j-(step1)]", reads sometimes occur outside of the aMod array (consider idx=0 and j=0, for example — you will be reading from aMod[-step1]). _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
