On Tue, 15 Feb 2011 21:36:27 +0100, Tomasz Rybak <bogom...@post.pl> wrote:
> I disagree here. IMO it makes no sense to use more blocks than there
> is SMs, as it introduces burden of switching blocks. In case of my code
> there is no switching between blocks - SM gets block to execute,
> executes kernel generating random numbers, finishes. After your change
> SM gets block, executes it, gets another block, ..., finishes.
> 
> Each thread already generates multiple random numbers in the loop.
> After your change it just loops less times than in my code.
> 
> Time for generating 100 000 000 floats on GF104:
> using 3*SMs: 0.0315589904785
> using 1*SMs: 0.0291240215302
> Those times are repeatable - for 3x I get 0.031, for 1x I get 0.029.
> 
> So please - revert to previous state (just apply attached patch).

Can't argue with that.

> OK, but do not punish Fermi for lacks of Tesla; I added test and use
> half threads only on Tesla. Fermi should still use maximum number
> of threads. 

Also fair.

I've applied your patch.

Andreas

Attachment: pgpMaGdT6V3Tv.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to