Re: [PyCUDA] Thread Problem

Andrea Cesari Wed, 18 Jul 2012 23:27:19 -0700

The problem is that the results match with cpu only for dim_x and dim_y minor 
of 32.For higher dimensions the cpu and gpu results are different.


From: [email protected]
To: [email protected]; [email protected]
Date: Wed, 18 Jul 2012 18:39:11 +0200
Subject: Re: [PyCUDA] Thread Problem





Hi, this is my code that, keep a 3d array, and for each pixel of the matrix 
find the minimum and put it to the corresponding pixel of a matrix b. Then 
compare the result with the cpu.
Obviously, with grid=(1,1) the max dimensions of the matrix can be 32x32 (1024 
threads per block; GTX 580).
So, because i want a matrix of (500,500) , i I thought to declare a 
block=(32,32,1) and grid=(16,16). But it doesn't work. I noted that it work 
only with square matrix and with dimensions minor of 32.
 
I have this problem only with the 3d array, in fact whit bi-dimensional array i 
solved it doing: 
 int idx = threadIdx.x + blockIdx.x * blockDim.x; 
with blocks=(1024,1,1) and grid=(245,1) (for (500,500) matrix) .

i tried to do the same thing but it doesn't work...

the code :


import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
import time
from pycuda.gpuarray import to_gpu
dim_x=32
dim_y=32
dim_z=10
a = numpy.random.randn(dim_z,dim_y,dim_x)
a = a.astype(numpy.int32)
b=numpy.zeros((dim_x,dim_y),dtype=numpy.int32)
dimz=numpy.array([dim_z],dtype=numpy.int32)
a_gpu=to_gpu(a)
b_gpu=to_gpu(b)
dimz_gpu=to_gpu(dimz)
mod = SourceModule("""
__global__ void findmin(int *a,int *b,int *dimz_gpu)
{
  int idx = threadIdx.x + blockIdx.x * blockDim.x; //OK
  int idy = threadIdx.y + blockIdx.y * blockDim.y; //OK
  int x_width = blockDim.x * gridDim.x;   //OK
  int y_width = blockDim.y * gridDim.y;
  int flat_id1 = idx + x_width * idy ;  
  int min=4294967296;
 for(int idz = 0; idz <dimz_gpu[0]; idz++) 
  {
  int flat_id = idx + x_width * idy + (x_width * y_width) * idz;  //OK
     if(a[flat_id]<min)
        {         
        min=a[flat_id];        
        b[flat_id1]=min;
        }        
  }    

}
  """)
func = mod.get_function("findmin")
func(a_gpu, b_gpu,dimz_gpu,block=(32,32,1),grid=(1,1))

print a_gpu.get()
print "b :\n"
b=b_gpu.get()
print b
minimo=100
b1=numpy.zeros((dim_x,dim_y),dtype=numpy.int32)
for i in range(0,dim_x):
    for j in range(0,dim_y):
        minimo=min(a[:,i,j])
        b1[i,j]=minimo

print "Difference between CPU:\n"        
print b1-b

Thanks!
Andrea


> Date: Thu, 12 Jul 2012 10:18:20 +1000
> Subject: Re: [PyCUDA] Thread Problem
> From: [email protected]
> To: [email protected]
> 
> Hi Andrea,
> 
> Unfortunately, I am not quite familiar with the topic. Probably the
> issue here is incorrect padding, or incorrect mode of numpy function
> you are using for comparison — logically I'd expect mode to be 'wrap',
> not 'reflect'. Moreover, why did you even prefer to take correlate1d()
> as a reference instead of numpy.convolve() or
> scipy.fftpack.convolve()?
> 
> Perhaps, it will be helpful to look at the sources of the two
> functions above and see how they do the padding. Scipy one is at
> https://github.com/scipy/scipy/blob/master/scipy/fftpack/src/convolve.c
> 
> 
> 
> On Thu, Jul 12, 2012 at 2:38 AM, Andrea Cesari <[email protected]> 
> wrote:
> > Hi,
> > excuse me if i write you in private, but it isn't properly a pycuda problem.
> > I would ask you an opinion.
> > To do de equivalent of scipy.ndimage.convolve1d (that is a linear
> > convolution), im trying to do something like this:
> > a and b are the vector to convolve.
> >
> > A=FFT(a)
> > B=FFT(b) whit zero-padding at the end
> > CONV=A.*B
> > conv=INV_FFT(CONV)
> >
> > it's right? because i tried to do this in matlab( i have not yet installed
> > pyfft) but the results are different.
> > It's a mathematical problem?
> >
> > Thanks for your patience,
> > Andrea
> >
> >> Date: Wed, 11 Jul 2012 22:48:25 +1000
> >
> >> Subject: Re: [PyCUDA] Thread Problem
> >> From: [email protected]
> >> To: [email protected]
> >> CC: [email protected]
> >>
> >> Hi Andrea,
> >>
> >> On Wed, Jul 11, 2012 at 10:25 PM, Andrea Cesari
> >> <[email protected]> wrote:
> >> > __global__ void gpu_kernel(int *corrGpu,int *aMod,int *b,int
> >> > *kernelSize_h)
> >> > {
> >> > int j,step1=kernelSize_h[0]/2; // <---
> >> ...
> >> > """)
> >>
> >> When I remove /2 where the arrow points, I get results identical with
> >> the CPU version. Are you sure it is necessary there?
> >>
> >> > About your advise: when i do: int idx = threadIdx.x+step, idx doesn't
> >> > start
> >> > from step1? so when j=0 idx-step1+j =0 ? it's wrong?
> >>
> >> Yes, sorry, that was my mistake. Everything is correct in this part.
                                          

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Thread Problem

Reply via email to