Thomas Wiecki <[email protected]> writes:

> Hi,
>
> the curandom Generator class initializes generators_per_block number of
> generators. This is the relevant code:
>
>     @property
>     @memoize_method
>     def generators_per_block(self):
>         return min(kernel.max_threads_per_block
>                 for kernel in self._kernels())
>
>
> On my machine the kernels have the following max_threads_per_block (for
> XORWOW):
>
> In [30]: [i.max_threads_per_block for i in g._kernels()]
> Out[30]: [512, 512, 512, 512, 384, 384, 384, 384, 384]
>
> The first four are for the normal and uniform generators. The last ones are
> for skip_aheads.
>
> Isn't this suboptimal? If I was only using the generators without
> skip-ahead it seems I could safely run 512 threads per block if those were
> initialized.

Right--but the performance difference is likely minimal. In fact, 384
might yet be the better block size. 

IOW: Large blocks != great performance. In fact, the opposite is often the case.

Andreas

Attachment: pgp6r1g6iYmwz.pgp
Description: PGP signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to