Dear Marie,

On Sun, 13 Mar 2011 16:04:06 -0700 (PDT), elafrit <afrit.mar...@gmail.com> 
wrote:
> I woder if I can ameliorate the pycuda code by editing the number of maximum
> threads in the gpuarray.py ?

The only way to find out is to try. If you do find a way to improve the
speed, please do let the list know.

I imagine that a better approach might be to try and introduce some
instruction-level parallelism. (or at least create some wiggle room for
the insn scheduler in ptxas) That, unfortunately, is sort of difficult.

> And I can't understand what's really happening when I use the methods of
> gpuarray to multiply a matrix with a scalar ? Is the scalar sent to the GPU
> for each element of the matrix or it's sent only the first time ? And is it
> sent as scalar or as gpuarray ?

CPU scalars are sent as kernel parameters, which is a fairly efficient
way of broadcasting to all thread blocks.

HTH,
Andreas

Attachment: pgpERhONXwO5y.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to