Hi Andrea,

On Wed, Jul 11, 2012 at 10:25 PM, Andrea Cesari
<[email protected]> wrote:
> __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h)
> {
>     int j,step1=kernelSize_h[0]/2; // <---
...
> """)

When I remove /2 where the arrow points, I get results identical with
the CPU version. Are you sure it is necessary there?

> About your advise: when i do: int idx = threadIdx.x+step, idx doesn't start
> from step1? so when j=0 idx-step1+j =0 ? it's wrong?

Yes, sorry, that was my mistake. Everything is correct in this part.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to