Thomas Wiecki <[email protected]> writes: > Hi, > > the curandom Generator class initializes generators_per_block number of > generators. This is the relevant code: > > @property > @memoize_method > def generators_per_block(self): > return min(kernel.max_threads_per_block > for kernel in self._kernels()) > > > On my machine the kernels have the following max_threads_per_block (for > XORWOW): > > In [30]: [i.max_threads_per_block for i in g._kernels()] > Out[30]: [512, 512, 512, 512, 384, 384, 384, 384, 384] > > The first four are for the normal and uniform generators. The last ones are > for skip_aheads. > > Isn't this suboptimal? If I was only using the generators without > skip-ahead it seems I could safely run 512 threads per block if those were > initialized.
Right--but the performance difference is likely minimal. In fact, 384 might yet be the better block size. IOW: Large blocks != great performance. In fact, the opposite is often the case. Andreas
pgp6r1g6iYmwz.pgp
Description: PGP signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
