Oh thanks,so the main problem is that not passing the array dimensions in the kernel i read outside the array. Now is clear. For numpy, only yesterday i discovered that for 3d array the indexing is (z,y,x); and this made me mad :) But, i thought, that for bidimensional array was (x,y) as in C. Thanks for the correction! In fact, among the many trials, i tried also to set (1,1,1) and (50,50) for a dim_x=dim_y=50. But this did not work, probally for the inversion of i and j in the cpu test code. Thanks! > Date: Thu, 19 Jul 2012 16:59:33 +1000 > Subject: Re: [PyCUDA] Thread Problem > From: [email protected] > To: [email protected] > CC: [email protected] > > Hi Andrea, > > On Thu, Jul 19, 2012 at 4:37 PM, Andrea Cesari <[email protected]> > wrote: > > yes..for example if i do: > > dim_x=33 > > dim_y=33 > > then chenge grid and block to this: (32,32,1) and (2,1) > > because i do ( 33*33=1089 threads, so grid= 1089/1024=1,063--> 2) > > When you do this, you read values outside of your array inside the > kernel — since it's not aware of the actual size of your array, it > tries all the x,y pairs from (0,0) to (32*2,32). My advice about using > blockDim and gridDim to get array size applied only to the cases when > it is equal to the grid size. In general case, you have to pass array > dimensions inside the kernel. Please see the fixed script below. I > also parametrized block size (my video card does not have CC2.0) and > fixed some errors with the order of dimensions — be careful, in numpy > it is (z, y, x), and in CUDA block/grid it is (x, y, z). > > > import pycuda.driver as cuda > import pycuda.autoinit > from pycuda.compiler import SourceModule > import numpy > import time > from pycuda.gpuarray import to_gpu > dim_x=50 > dim_y=100 > dim_z=10 > a = numpy.random.randn(dim_z,dim_y,dim_x) > a = a.astype(numpy.int32) > b=numpy.zeros((dim_y,dim_x),dtype=numpy.int32) > a_gpu=to_gpu(a) > b_gpu=to_gpu(b) > mod = SourceModule(""" > __global__ void findmin(int *a,int *b,int dim_x, int dim_y, int dim_z) > { > int idx = threadIdx.x + blockIdx.x * blockDim.x; //OK > int idy = threadIdx.y + blockIdx.y * blockDim.y; //OK > if (idx >= dim_x || idy >= dim_y) > return; > int flat_id1 = idx + dim_x * idy ; > int min=100; > for(int idz = 0; idz <dim_z; idz++) > { > int flat_id = idx + dim_x * idy + (dim_x * dim_y) * idz; //OK > if(a[flat_id]<min) > { > min=a[flat_id]; > b[flat_id1]=min; > } > } > > } > """) > block_size = 16 > func = mod.get_function("findmin") > func(a_gpu, b_gpu, numpy.int32(dim_x), numpy.int32(dim_y), numpy.int32(dim_z), > block=(block_size,block_size,1), > grid=((dim_x - 1) / block_size + 1,(dim_y - 1) / block_size + 1)) > > print a_gpu.get() > print "b :\n" > b=b_gpu.get() > print b > minimo=100 > b1=numpy.zeros((dim_y,dim_x),dtype=numpy.int32) > for i in range(0,dim_x): > for j in range(0,dim_y): > minimo=min(a[:,j,i]) > b1[j,i]=minimo > > print "Difference between CPU:\n" > print b1-b
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
