Ok, i fix the problem!  I always use mem_alloc and memcpy infact.
Another question, that in my opinion is another error using thread index.
I had to implement the equivalent of scipy.ndimage.convolve1d (that is a cross 
correlation function..but this isn't the problem)
I had done a sequential script to verify my algorithm first. The significant 
code is this:

step=len(b)/2
for i in range(step,lungA+step):
        for j in range(0,len(b)):
                corrCpu[i-step]=corrCpu[i-step]+(a[i-step+j]*b[j])

where 'a' and 'b' are the vector to be correlated. 

The scipy function is:
 nd.correlate1d(a,b,mode='reflect'), 
where 'reflect' means that the first 'len(b)/2' elements of a and the last 
'len(b)/2-1' elements of a, are repeated (after reflection), first and after 
'a'.
To do this in my script i create an aMod vector in this mode: 

a1=a[:(step)][::-1]
a2=a[-(step-1):][::-1]
a=np.append(a1,np.append(a,a2))

So for example if a =[-1,1,0,1,1,0,1,1] and b=[1,1,-1,-1] (so len(b)=4) then 
aMod=[1,-1,-1,1,0,1,1,0,1,1,1].

I converted all in pyCUDA in this mode: 

numthreads=len(a)
c=np.zeros(len(a),dtype=numpy.int32)
kernelSize_h=np.zeros(2)
kernelSize_h=kernelSize_h.astype(numpy.int32)
kernelSize_h[0]=len(kernel_h)
kernelSize_h[1]=len(a)


mod=SourceModule("""

__global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int *kernelSize_h)

{
    int j,step1=kernelSize_h[0]/2;
    int idx = threadIdx.x+step1;
        for(j=0;j<step1;j++)
            corrGpu[idx-step1]+=aMod[idx+j-(step1)]*b[j];
    
}

""")

manipulate_vector=mod.get_function("gpu_kernel")
c_gpu=to_gpu(c)
manipulate_vector(c_gpu,drv.In(aMod),drv.In(kernel_h),drv.In(kernelSize_h),block=(numthreads,1,1),grid=(1,1))

print "Corr. GPU \n"
print c_gpu.get()
corrcpu=nd.correlate1d(a,kernel_h,mode='reflect')
print "Corr CPU= "
print corrcpu
print "Differenza : "
print c_gpu.get()-corrcpu


But the result (in cuda solution) is different, while in sequential script they 
match. I can't see the error! Can you help me? 
I'm stuck here for days!
Thanks!

> Date: Wed, 11 Jul 2012 00:21:57 +1000
> Subject: Re: [PyCUDA] Thread Problem
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> 
> On Wed, Jul 11, 2012 at 12:15 AM, Andrea Cesari
> <[email protected]> wrote:
> > so, the firs two elements of a vector are always garbage?
> > can i solve it allocating manually the memory? but should be the same of
> > drv.Out() i think..or no?
> 
> The first two elements are garbage because:
> 1) you have not initialized them to anything (consequence of using drv.Out), 
> and
> 2) you have not written anything there (consequence of using i =
> threadIdx.x + 2)
> So if you want them to contain something meaningful, fix either 1) or 2).
> 
> 1) can be fixed, for example, like this:
> 
> # to_gpu() takes numpy array, copies it to GPU and returns you
> reference to this GPU array
> from pycuda.gpuarray import to_gpu
> 
> lung_vett=10;
> thread_index = mod.get_function("thread_index")
> dest=numpy.zeros(lung_vett);
> dest_gpu = to_gpu(dest)
> thread_index(dest_gpu, block=(lung_vett,1,1))
> 
> print dest_gpu.get()
                                          
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to