Dnia 2010-12-13, pon o godzinie 06:16 -0800, reckoner pisze: > Hi, > > I am getting started with PyCUDA and I have attached a short example > that tries to compute the cumulative sum of an array ( see attached ). > While it seems to work okay, I may not be using the index incorrectly, > or may be using memory inefficiently because I don't understand strides. > Also, the example uses 512 threads, which I understand from > dump_properties.py is the most I can use. How would I change this code > to use fewer threads? Would I just changed the block=( 512, 1,1) to > something like block=(64,1,1) if I wanted to use 64 threads?
Yes. To get the most performance you must create hierarchy of threads: thread:warp:block:grid. You do not need to always have block full of threads - e.g. if your code uses many registers or shared memory, you will want to limit number of threads to have faster code (or even to have code that can be run on particular machine). > Finally, is there a way to accomplish the same thing without having to > use SourceModule? I would like to avoid writing my own CUDA code and > would prefer to use gpuarray or Elementwise, if possible. > For reduction (getting one value out of entire array) look into ReductionKernel class from module pycuda.reduction. Ta have just sum, you can call method sum from GPUArray. There are some reduction kernels defined in GPUArray class. For your code - you are computing sum independently for each row. Instead of having single-dimensional block, you could try to use X and Y dimensions - it might (or might not) make your code simpler. You are also using full expression to calculate index: const int i = blockDim.x*blockIdx.x + threadIdx.x; while you are using only one block - this makes your code harder to analyse. As you are computing not one sum but many of them, you cannot simply use ReductionKernel - but you might try using ReductionKernel for computing sum and use GPUArray slices. Hope my explanations helped a little bit. Best regards. -- Tomasz Rybak <[email protected]> GPG/PGP key ID: 2AD5 9860 Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860 http://member.acm.org/~tomaszrybak
signature.asc
Description: This is a digitally signed message part
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
