The problem seem to be different from the one explained in the link.... On Tue, Jan 18, 2011 at 9:46 AM, Martin Laprise <[email protected]>wrote:
> Hi, I just made some experiments with the CURAND wrappers. It seem to work > very nicely except for a little detail that I can't figure out. The > initialization of the generator and the actual random number generation seem > very fast. But for what ever reason, PyCUDA take a long time to "recover" > after the number generation. This pause is significantly longer than the > actual computation and the delay increase with N. Here is an example: > > > import numpy as np > import pycuda.autoinit > import pycuda.gpuarray > from pycuda.curandom import PseudoRandomNumberGenerator, > QuasiRandomNumberGenerator > import cProfile > import time as clock > > > def curand_prof(): > > N = 100000000 > > t1 = clock.time() > # GPU > rr = PseudoRandomNumberGenerator(0, > np.random.random(128).astype(np.int32)) > data = pycuda.gpuarray.zeros([N], np.float32) > rr.fill_normal_float(data.gpudata, N) > t2 = clock.time() > print "Bench 1: " + str(t2-t1) + " sec" > > > if __name__ == "__main__": > t1 = clock.time() > curand_prof() > t2 = clock.time() > print "Bench 2: " + str(t2-t1) + " sec" > > > Here is the actual output with a GTX 260 gpu: > Bench 1: 0.0117599964142 sec > Bench 2: 4.40562295914 sec > > In the example, the pause have no consequence, but if I want to use the > random matrix in an other kernel ... it's quite a delay. I've made some > research and my guess is that the problem is linked to this already reported > problem here: > > http://forums.nvidia.com/index.php?showtopic=185740 > > Anyone knows how we can implement the solution to the wrapper ? > > Martin >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
