Dnia 2010-12-13, pon o godzinie 06:16 -0800, reckoner pisze:
> Hi,
> 
> I am getting started with PyCUDA and I have attached a short example 
> that tries to compute the cumulative sum of an array ( see attached ). 
> While it seems to work okay, I may not be using the index incorrectly, 
> or may be using memory inefficiently because I don't understand strides. 
> Also, the example uses 512 threads, which I understand from 
> dump_properties.py is the most I can use. How would I change this code 
> to use fewer threads? Would I just changed the block=( 512, 1,1) to 
> something like block=(64,1,1) if I wanted to use 64 threads?

Yes. To get the most performance you must create hierarchy of threads:
thread:warp:block:grid.
You do not need to always have block full of threads - e.g. if your
code uses many registers or shared memory, you will want to limit
number of threads to have faster code (or even to have code that can be
run on particular machine).


> Finally, is there a way to accomplish the same thing without having
to 
> use SourceModule? I would like to avoid writing my own CUDA code and 
> would prefer to use gpuarray or Elementwise, if possible.
> 

For reduction (getting one value out of entire array)
look into ReductionKernel class from module pycuda.reduction.

Ta have just sum, you can call method sum from GPUArray.
There are some reduction kernels defined in GPUArray class.

For your code - you are computing sum independently for each row.
Instead of having single-dimensional block, you could try to use
X and Y dimensions - it might (or might not) make your code simpler.
You are also using full expression to calculate index:
const int i = blockDim.x*blockIdx.x + threadIdx.x;
while you are using only one block - this makes your code
harder to analyse.

As you are computing not one sum but many of them, you cannot simply
use ReductionKernel - but you might try using ReductionKernel for
computing sum and use GPUArray slices.

Hope my explanations helped a little bit.

Best regards.

-- 
Tomasz Rybak <[email protected]> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to