Re: [PyCUDA] Thread Problem

Bogdan Opanchuk Thu, 19 Jul 2012 00:00:21 -0700

Hi Andrea,

On Thu, Jul 19, 2012 at 4:37 PM, Andrea Cesari <[email protected]> wrote:
> yes..for example if i do:
> dim_x=33
> dim_y=33
> then chenge grid and block to this: (32,32,1) and (2,1)
> because i do ( 33*33=1089 threads, so grid= 1089/1024=1,063--> 2)


When you do this, you read values outside of your array inside the
kernel — since it's not aware of the actual size of your array, it
tries all the x,y pairs from (0,0) to (32*2,32). My advice about using
blockDim and gridDim to get array size applied only to the cases when
it is equal to the grid size. In general case, you have to pass array
dimensions inside the kernel. Please see the fixed script below. I
also parametrized block size (my video card does not have CC2.0) and
fixed some errors with the order of dimensions — be careful, in numpy
it is (z, y, x), and in CUDA block/grid it is (x, y, z).


import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy
import time
from pycuda.gpuarray import to_gpu
dim_x=50
dim_y=100
dim_z=10
a = numpy.random.randn(dim_z,dim_y,dim_x)
a = a.astype(numpy.int32)
b=numpy.zeros((dim_y,dim_x),dtype=numpy.int32)
a_gpu=to_gpu(a)
b_gpu=to_gpu(b)
mod = SourceModule("""
__global__ void findmin(int *a,int *b,int dim_x, int dim_y, int dim_z)
{
  int idx = threadIdx.x + blockIdx.x * blockDim.x; //OK
  int idy = threadIdx.y + blockIdx.y * blockDim.y; //OK
  if (idx >= dim_x || idy >= dim_y)
    return;
  int flat_id1 = idx + dim_x * idy ;
  int min=100;
 for(int idz = 0; idz <dim_z; idz++)
  {
  int flat_id = idx + dim_x * idy + (dim_x * dim_y) * idz;  //OK
     if(a[flat_id]<min)
        {
        min=a[flat_id];
        b[flat_id1]=min;
        }
  }

}
  """)
block_size = 16
func = mod.get_function("findmin")
func(a_gpu, b_gpu, numpy.int32(dim_x), numpy.int32(dim_y), numpy.int32(dim_z),
  block=(block_size,block_size,1),
  grid=((dim_x - 1) / block_size + 1,(dim_y - 1) / block_size + 1))

print a_gpu.get()
print "b :\n"
b=b_gpu.get()
print b
minimo=100
b1=numpy.zeros((dim_y,dim_x),dtype=numpy.int32)
for i in range(0,dim_x):
    for j in range(0,dim_y):
        minimo=min(a[:,j,i])
        b1[j,i]=minimo

print "Difference between CPU:\n"
print b1-b

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Thread Problem

Reply via email to