Lev, Thanks for the explanation...that definitely helps. But how does the indexing work for a 2d case?
z1 = numpy.zeros((1024)).astype(numpy.float32) kernel1(drv.Out(z1),block=(16,16,1),grid=(2,2)) int idx=?? int idy =?? Thanks! Mike ________________________________ From: Lev Givon <l...@columbia.edu> To: Mike Tischler <mikethesoils...@yahoo.com> Cc: pycuda@tiker.net Sent: Thu, March 24, 2011 6:13:54 PM Subject: Re: [PyCUDA] index multiple blocks and grids Received from Mike Tischler on Thu, Mar 24, 2011 at 03:41:30PM EDT: > Hi, > I'm new to CUDA and PyCUDA, and have having a problem indexing multiple > grids. > > I'm using an older CUDA enabled card (Quadro FX 1700) before I begin writing >for > > a larger GPU. I've been trying to understand the relationship between > threads, > > blocks, and grids in the context of my individual card. To do so, I've set > up >a > > simple script. (snip) > However, what if I have an array that's 1024 in length? If I understand the > documentation correctly, block=(16,16,1) is the max value (256 threads) > allowed > > for my hardware, which means I have to increase the number of grids. If I > change the parameters of my script to: > > z1 = numpy.zeros((1024)).astype(numpy.float32) > kernel1(drv.Out(z1),block=(16,16,1),grid=(2,2)) > > How do I correctly index the array locations in my kernel function given > multiple grids (z1[???]=???) ? There is a gridDim property, but not gridIdx > property, like with threads and blocks. > > > Thanks! > Mike threadIdx identifies the thread in a single block. To access a 1D array of 1024 elements assuming a maximum of 256 threads per block, you can combine the values in threadIdx and blockIdx, e.g., int idx = blockIdx.x*blockDim.x + threadIdx.x; and launch the kernel with a thread block with dimensions (256, 1, 1) and a grid with dimensions (4, 1). See Chapter 2 of the CUDA Programming Guide for more info. L.G.
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda