Hi Andrea,

Please send the full working script which anyone can save and execute
without assembling it from the excerpts you provided. In the mean
time, that's what I can say by looking at the kernel:

On Wed, Jul 11, 2012 at 1:24 AM, Andrea Cesari <[email protected]> wrote:
> __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h)
>
> {
>     int j,step1=kernelSize_h[0]/2;
>     int idx = threadIdx.x+step1;
>         for(j=0;j<step1;j++)
>             corrGpu[idx-step1]+=aMod[idx+j-(step1)]*b[j];
>
> }

With the construction like "aMod[idx+j-(step1)]", reads sometimes
occur outside of the aMod array (consider idx=0 and j=0, for example —
you will be reading from aMod[-step1]).

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to