I'd just like to add that yet another option would be to use the manager/proxy object in multiprocessing. In this case numpy.random.random will be called in the parent process. I have not used this and I am not sure how efficient it is. But the possibility is there.
Sturla Molden === test.py === from test_helper import task, RandomManager from multiprocessing import Pool rm = RandomManager() rm.start() random = rm.Random() p = Pool(4) jobs = list() for i in range(4): jobs.append(p.apply_async(task, (4,random))) print [j.get() for j in jobs] p.close() p.join() rm.shutdown() === test_helper.py === import numpy as np import multiprocessing as mp from mp.managers import BaseManager, CreatorMethod class RandomClass(object): def random(self, x): return np.random.random(x) class RandomManager(BaseManager): Random = CreatorMethod(RandomClass) def task(x, random): return random.random(x) On 12/11/2008 4:20 PM, Gael Varoquaux wrote: > Hi there, > > I have been using the multiprocessing module a lot to do statistical tests > such as Monte Carlo or resampling, and I have just discovered something > that makes me wonder if I haven't been accumulating false results. Given > two files: > > === test.py === > from test_helper import task > from multiprocessing import Pool > > p = Pool(4) > > jobs = list() > for i in range(4): > jobs.append(p.apply_async(task, (4, ))) > > print [j.get() for j in jobs] > > p.close() > p.join() > > === test_helper.py === > import numpy as np > > def task(x): > return np.random.random(x) > > ======= > > If I run test.py, I get: > > [array([ 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ > 0.35773964, 0.63945684, 0.50855196, 0.08631373]), array([ 0.35773964, > 0.63945684, 0.50855196, 0.08631373]), array([ 0.65357725, 0.35649382, > 0.02203999, 0.7591353 ])] > > In other words, the 4 processes give me the same exact results. > > Now I understand why this is the case: the different instances of the > random number generator where created by forking from the same process, > so they are exactly the very same object. This is howver a fairly bad > trap. I guess other people will fall into it. > > The take home message is: > **call 'numpy.random.seed()' when you are using multiprocessing** > > I wonder if we can find a way to make this more user friendly? Would be > easy, in the C code, to check if the PID has changed, and if so reseed > the random number generator? I can open up a ticket for this if people > think this is desirable (I think so). > > On a side note, there are a score of functions in numpy.random with > __module__ to None. It makes it inconvenient to use it with > multiprocessing (for instance it forced the creation of the 'test_helper' > file here). > > Gaƫl > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@scipy.org > http://projects.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion