Oh thanks,so the main problem is that not passing the array dimensions in the 
kernel i read outside the array.  Now is clear. 
For numpy, only yesterday i discovered that for 3d array the indexing is 
(z,y,x); and this made me mad :) But, i thought, that for bidimensional array 
was (x,y) as in C. 
Thanks for the correction! In fact, among the many trials, i tried also to set 
(1,1,1) and (50,50) for a dim_x=dim_y=50. But this did not work, probally for 
the inversion of i and j in the cpu test code.
Thanks!
> Date: Thu, 19 Jul 2012 16:59:33 +1000
> Subject: Re: [PyCUDA] Thread Problem
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> 
> Hi Andrea,
> 
> On Thu, Jul 19, 2012 at 4:37 PM, Andrea Cesari <[email protected]> 
> wrote:
> > yes..for example if i do:
> > dim_x=33
> > dim_y=33
> > then chenge grid and block to this: (32,32,1) and (2,1)
> > because i do ( 33*33=1089 threads, so grid= 1089/1024=1,063--> 2)
> 
> When you do this, you read values outside of your array inside the
> kernel — since it's not aware of the actual size of your array, it
> tries all the x,y pairs from (0,0) to (32*2,32). My advice about using
> blockDim and gridDim to get array size applied only to the cases when
> it is equal to the grid size. In general case, you have to pass array
> dimensions inside the kernel. Please see the fixed script below. I
> also parametrized block size (my video card does not have CC2.0) and
> fixed some errors with the order of dimensions — be careful, in numpy
> it is (z, y, x), and in CUDA block/grid it is (x, y, z).
> 
> 
> import pycuda.driver as cuda
> import pycuda.autoinit
> from pycuda.compiler import SourceModule
> import numpy
> import time
> from pycuda.gpuarray import to_gpu
> dim_x=50
> dim_y=100
> dim_z=10
> a = numpy.random.randn(dim_z,dim_y,dim_x)
> a = a.astype(numpy.int32)
> b=numpy.zeros((dim_y,dim_x),dtype=numpy.int32)
> a_gpu=to_gpu(a)
> b_gpu=to_gpu(b)
> mod = SourceModule("""
> __global__ void findmin(int *a,int *b,int dim_x, int dim_y, int dim_z)
> {
>   int idx = threadIdx.x + blockIdx.x * blockDim.x; //OK
>   int idy = threadIdx.y + blockIdx.y * blockDim.y; //OK
>   if (idx >= dim_x || idy >= dim_y)
>     return;
>   int flat_id1 = idx + dim_x * idy ;
>   int min=100;
>  for(int idz = 0; idz <dim_z; idz++)
>   {
>   int flat_id = idx + dim_x * idy + (dim_x * dim_y) * idz;  //OK
>      if(a[flat_id]<min)
>         {
>         min=a[flat_id];
>         b[flat_id1]=min;
>         }
>   }
> 
> }
>   """)
> block_size = 16
> func = mod.get_function("findmin")
> func(a_gpu, b_gpu, numpy.int32(dim_x), numpy.int32(dim_y), numpy.int32(dim_z),
>   block=(block_size,block_size,1),
>   grid=((dim_x - 1) / block_size + 1,(dim_y - 1) / block_size + 1))
> 
> print a_gpu.get()
> print "b :\n"
> b=b_gpu.get()
> print b
> minimo=100
> b1=numpy.zeros((dim_y,dim_x),dtype=numpy.int32)
> for i in range(0,dim_x):
>     for j in range(0,dim_y):
>         minimo=min(a[:,j,i])
>         b1[j,i]=minimo
> 
> print "Difference between CPU:\n"
> print b1-b
                                          
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to