In fact you're right..but in according to the theory could be
kernelSize_h[0]/2. In effect if you see my CPU code i use kernelSize_h[0]/2.
import numpy as np
import scipy.ndimage as nd
import time
a=[1,0,1,1,0,1,1,0,0,1,1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,2,2,2,1]
#a = np.array([0,0,1,1,1,0,0,0,1,1,1,0,0,1,1])
lungA=len(a)
#b=[-5,-5,-5,-5,-5,-4,-4,5,5,5,5,5,4,4]
b=np.array([-1,-1,-1,1,1,-1,1,1,-1,1])
step=len(b)/2
corrCpu=np.zeros(lungA)
corrCpu=corrCpu.astype(np.int16)
a1=a[:(step)][::-1]
a2=a[-(step-1):][::-1]
a=np.append(a1,np.append(a,a2))
t1=time.time()
for i in range(step,lungA+step):
for j in range(0,len(b)):
corrCpu[i-step]=corrCpu[i-step]+(a[i-step+j]*b[j])
print time.time()-t1
a=[1,0,1,1,0,1,1,0,0,1,1,0,1,0,1,0,1,0,1,0,0,1,1,0,1,0,2,2,2,1]
#a = np.array([0,0,1,1,1,0,0,0,1,1,1,0,0,1,1])
t2=time.time()
corrPy=nd.correlate1d(a,b,mode='reflect',origin=0)
print time.time()-t2
print "CorCpu= "
print corrCpu
print "CorPy= "
print corrPy
print "Differenza :\n"
print corrCpu-corrPy
It's a strange fact this..
> Date: Wed, 11 Jul 2012 22:48:25 +1000
> Subject: Re: [PyCUDA] Thread Problem
> From: [email protected]
> To: [email protected]
> CC: [email protected]
>
> Hi Andrea,
>
> On Wed, Jul 11, 2012 at 10:25 PM, Andrea Cesari
> <[email protected]> wrote:
> > __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h)
> > {
> > int j,step1=kernelSize_h[0]/2; // <---
> ...
> > """)
>
> When I remove /2 where the arrow points, I get results identical with
> the CPU version. Are you sure it is necessary there?
>
> > About your advise: when i do: int idx = threadIdx.x+step, idx doesn't start
> > from step1? so when j=0 idx-step1+j =0 ? it's wrong?
>
> Yes, sorry, that was my mistake. Everything is correct in this part.
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda