Dear Marie, On Sun, 13 Mar 2011 16:04:06 -0700 (PDT), elafrit <afrit.mar...@gmail.com> wrote: > I woder if I can ameliorate the pycuda code by editing the number of maximum > threads in the gpuarray.py ?
The only way to find out is to try. If you do find a way to improve the speed, please do let the list know. I imagine that a better approach might be to try and introduce some instruction-level parallelism. (or at least create some wiggle room for the insn scheduler in ptxas) That, unfortunately, is sort of difficult. > And I can't understand what's really happening when I use the methods of > gpuarray to multiply a matrix with a scalar ? Is the scalar sent to the GPU > for each element of the matrix or it's sent only the first time ? And is it > sent as scalar or as gpuarray ? CPU scalars are sent as kernel parameters, which is a fairly efficient way of broadcasting to all thread blocks. HTH, Andreas
pgpERhONXwO5y.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda