Re: [Numpy-discussion] SFMT (faster mersenne twister)
Pierre-Andre Noel noel.pierre.an...@gmail.com wrote: Why not do something like the C++11 random? In random, a generator is the engine producing randomness, and a distribution decides what is the type of outputs that you want. This is what randomkit is doing internally, which is why it is so easy to plug in a different generator. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
Julian Taylor jtaylor.deb...@googlemail.com wrote: But as already mentioned by Robert, we know what we can do, what is missing is someone writting the code. This is actually a part of NumPy I know in detail, so I will be able to contribute. Robert Kern's last post about objects like np.random.SFMT() working similar to RandomState should be doable and not break any backwards compatibility. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 8 Sep 2014 14:43, Robert Kern robert.k...@gmail.com wrote: On Mon, Sep 8, 2014 at 6:05 PM, Pierre-Andre Noel noel.pierre.an...@gmail.com wrote: I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). Why not do something like the C++11 random? In random, a generator is the engine producing randomness, and a distribution decides what is the type of outputs that you want. Here is the example on http://www.cplusplus.com/reference/random/ . std::default_random_engine generator; std::uniform_int_distributionint distribution(1,6); int dice_roll = distribution(generator); // generates number in the range 1..6 For convenience, you can bind the generator with the distribution (still from the web page above). auto dice = std::bind(distribution, generator); int wisdom = dice()+dice()+dice(); Here is how I propose to adapt this scheme to numpy. First, there would be a global generator defaulting to the current implementation of Mersene Twister. Calls to numpy's RandomState, seed, get_state and set_state would affect this global generator. All numpy functions associated to random number generation (i.e., everything listed on http://docs.scipy.org/doc/numpy/reference/routines.random.html except for RandomState, seed, get_state and set_state) would accept the kwarg generator, which defaults to the global generator (by default the current Mersene Twister). Now there could be other generator objects: the new Mersene Twister, some lightweight-but-worse generator, or some cryptographically-safe random generator. Each such generator would have RandomState, seed, get_state and set_state methods (except perhaps the criptographically-safe ones). When calling a numpy function with generator=my_generator, that function uses this generator instead the global one. Moreover, there would be be a function, say select_default_random_engine(generator), which changes the global generator to a user-specified one. I think the Python standard library's example is more instructive. We have new classes for each new core uniform generator. They will share a common superclass to share the implementation of the non-uniform distributions. numpy.random.RandomState will continue to be the current Mersenne Twister implementation, and so will the underlying global RandomState for all of the convenience functions in numpy.random. If you want the SFMT variant, you instantiate numpy.random.SFMT() and call its methods directly. There's also another reason why generator decisions should be part of the RandomState object itself: we may want to change the distribution methods themselves over time (e.g., people have been complaining for a while that we use a suboptimal method for generating gaussian deviates), but changing these creates similar backcompat problems. So we need a way to say please give me samples using the old gaussian implementation or whatever. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 09/09/14 20:08, Nathaniel Smith wrote: There's also another reason why generator decisions should be part of the RandomState object itself: we may want to change the distribution methods themselves over time (e.g., people have been complaining for a while that we use a suboptimal method for generating gaussian deviates), but changing these creates similar backcompat problems. Which one should we rather use? Ziggurat? Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Sun, Sep 7, 2014 at 4:23 PM, Sturla Molden sturla.mol...@gmail.com wrote: Benjamin Root ben.r...@ou.edu wrote: In addition to issues with reproducibility, think of all of the unit tests that would break! That is a reproducibility problem :) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Related aside about reproducibility of random numbers: IMO: scipy.stats.distributions.rvs does not guarantee yet the values of the random numbers except for those that are directly produced by numpy. In contrast to numpy.random, scipy's distributions don't have unit tests for the specific values of the rvs, and the rvs code for specific distribution could still be improved in some cases, I guess Josef ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). Why not do something like the C++11 random? In random, a generator is the engine producing randomness, and a distribution decides what is the type of outputs that you want. Here is the example on http://www.cplusplus.com/reference/random/ . std::default_random_engine generator; std::uniform_int_distributionint distribution(1,6); int dice_roll = distribution(generator); // generates number in the range 1..6 For convenience, you can bind the generator with the distribution (still from the web page above). auto dice = std::bind(distribution, generator); int wisdom = dice()+dice()+dice(); Here is how I propose to adapt this scheme to numpy. First, there would be a global generator defaulting to the current implementation of Mersene Twister. Calls to numpy's RandomState, seed, get_state and set_state would affect this global generator. All numpy functions associated to random number generation (i.e., everything listed on http://docs.scipy.org/doc/numpy/reference/routines.random.html except for RandomState, seed, get_state and set_state) would accept the kwarg generator, which defaults to the global generator (by default the current Mersene Twister). Now there could be other generator objects: the new Mersene Twister, some lightweight-but-worse generator, or some cryptographically-safe random generator. Each such generator would have RandomState, seed, get_state and set_state methods (except perhaps the criptographically-safe ones). When calling a numpy function with generator=my_generator, that function uses this generator instead the global one. Moreover, there would be be a function, say select_default_random_engine(generator), which changes the global generator to a user-specified one. This is also why parallel random number generators and parallel stochastic algorithms are so hard to program, because the operating systems' scheduler can easily break the reproducibility. What I propose would also simplify this: each thread can use its own independently-seeded generator. Timing is no longer a problem: as long as which-thread-does-what is not affected by the scheduler, the execution remains deterministic. On 09/07/2014 12:51 PM, Sturla Molden wrote: James A. Bednar jbed...@inf.ed.ac.uk wrote: Please don't ever, ever break the sequence of numpy's random numbers! Please! We have put a lot of effort into being able to reproduce our published work exactly, Jup, it cannot be understated how important this is for reproducibility of published research. Thus from a scientific standpoint it is important that random numbers are not random. Some might think that it's just important that they are as random as possible, but reproducibility is just as essential to stochastic simulations. This is also why parallel random number generators and parallel stochastic algorithms are so hard to program, because the operating systems' scheduler can easily break the reproducibility. I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). A particular candidate I think we should consider is the DCMT, which is exceptionally good for parallel algorithms (the DCMT code is now BSD licensed, it used to be LGPL). Because of the way randomkit it written, it is very easy to plug-in different generators. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 08.09.2014 19:05, Pierre-Andre Noel wrote: I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). ... Here is how I propose to adapt this scheme to numpy. First, there would be a global generator defaulting to the current implementation of Mersene Twister. Calls to numpy's RandomState, seed, get_state and set_state would affect this global generator. All numpy functions associated to random number generation (i.e., everything listed on http://docs.scipy.org/doc/numpy/reference/routines.random.html except for RandomState, seed, get_state and set_state) would accept the kwarg generator, which defaults to the global generator (by default the current Mersene Twister). I don't think every function would need a generator argument, for real world applications it should be sufficient to have the state object carry which generator is used and maybe a switch to change the global one. But as already mentioned by Robert, we know what we can do, what is missing is someone writting the code. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Mon, Sep 8, 2014 at 6:05 PM, Pierre-Andre Noel noel.pierre.an...@gmail.com wrote: I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). Why not do something like the C++11 random? In random, a generator is the engine producing randomness, and a distribution decides what is the type of outputs that you want. Here is the example on http://www.cplusplus.com/reference/random/ . std::default_random_engine generator; std::uniform_int_distributionint distribution(1,6); int dice_roll = distribution(generator); // generates number in the range 1..6 For convenience, you can bind the generator with the distribution (still from the web page above). auto dice = std::bind(distribution, generator); int wisdom = dice()+dice()+dice(); Here is how I propose to adapt this scheme to numpy. First, there would be a global generator defaulting to the current implementation of Mersene Twister. Calls to numpy's RandomState, seed, get_state and set_state would affect this global generator. All numpy functions associated to random number generation (i.e., everything listed on http://docs.scipy.org/doc/numpy/reference/routines.random.html except for RandomState, seed, get_state and set_state) would accept the kwarg generator, which defaults to the global generator (by default the current Mersene Twister). Now there could be other generator objects: the new Mersene Twister, some lightweight-but-worse generator, or some cryptographically-safe random generator. Each such generator would have RandomState, seed, get_state and set_state methods (except perhaps the criptographically-safe ones). When calling a numpy function with generator=my_generator, that function uses this generator instead the global one. Moreover, there would be be a function, say select_default_random_engine(generator), which changes the global generator to a user-specified one. I think the Python standard library's example is more instructive. We have new classes for each new core uniform generator. They will share a common superclass to share the implementation of the non-uniform distributions. numpy.random.RandomState will continue to be the current Mersenne Twister implementation, and so will the underlying global RandomState for all of the convenience functions in numpy.random. If you want the SFMT variant, you instantiate numpy.random.SFMT() and call its methods directly. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
In addition to issues with reproducibility, think of all of the unit tests that would break! On Sun, Sep 7, 2014 at 3:51 PM, Sturla Molden sturla.mol...@gmail.com wrote: James A. Bednar jbed...@inf.ed.ac.uk wrote: Please don't ever, ever break the sequence of numpy's random numbers! Please! We have put a lot of effort into being able to reproduce our published work exactly, Jup, it cannot be understated how important this is for reproducibility of published research. Thus from a scientific standpoint it is important that random numbers are not random. Some might think that it's just important that they are as random as possible, but reproducibility is just as essential to stochastic simulations. This is also why parallel random number generators and parallel stochastic algorithms are so hard to program, because the operating systems' scheduler can easily break the reproducibility. I think we could add new generators to NumPy though, perhaps with a keyword to control the algorithm (defaulting to the current Mersenne Twister). A particular candidate I think we should consider is the DCMT, which is exceptionally good for parallel algorithms (the DCMT code is now BSD licensed, it used to be LGPL). Because of the way randomkit it written, it is very easy to plug-in different generators. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
Benjamin Root ben.r...@ou.edu wrote: In addition to issues with reproducibility, think of all of the unit tests that would break! That is a reproducibility problem :) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
| Date: Fri, 05 Sep 2014 13:19:57 -0400 | From: Neal Becker ndbeck...@gmail.com | | I think it's somewhat debatable whether generating a different | sequence of random numbers counts as breaking backward | compatibility. Please don't ever, ever break the sequence of numpy's random numbers! Please! We have put a lot of effort into being able to reproduce our published work exactly, and all of that would be in vain if the sequence changes. See e.g.: http://journal.frontiersin.org/Journal/10.3389/fninf.2013.00044/full We'd be very happy to see additional number generators appear *alongside* the existing ones, though, particularly if there are faster or otherwise better ones! Jim Bednar -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
Robert Kern wrote: On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker ndbeck...@gmail.com wrote: http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html What would you like to say about it? If it is faster (and at least as good), maybe we'd like to adopt it to replace that used for mtrand -- -- Those who don't understand recursion are doomed to repeat it ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker ndbeck...@gmail.com wrote: http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html What would you like to say about it? If it is faster (and at least as good), maybe we'd like to adopt it to replace that used for mtrand It's a variant of the standard MT rather than just an implementation of it, so we can't just drop it in. You will need to build the infrastructure to support multiple PRNGs first (or rather, build the infrastructure to reuse the non-uniform distribution code with multiple core PRNGs). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
Robert Kern wrote: On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker ndbeck...@gmail.com wrote: http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html What would you like to say about it? If it is faster (and at least as good), maybe we'd like to adopt it to replace that used for mtrand It's a variant of the standard MT rather than just an implementation of it, so we can't just drop it in. You will need to build the infrastructure to support multiple PRNGs first (or rather, build the infrastructure to reuse the non-uniform distribution code with multiple core PRNGs). You mean it's not backward compatible because it won't generate exactly the same sequence of output for a given seed, and therefore we wouldn't want to make that change? I think it's somewhat debatable whether generating a different sequence of random numbers counts as breaking backward compatibility. -- -- Those who don't understand recursion are doomed to repeat it ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Fri, Sep 5, 2014 at 6:19 PM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Fri, Sep 5, 2014 at 12:05 PM, Neal Becker ndbeck...@gmail.com wrote: Robert Kern wrote: On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker ndbeck...@gmail.com wrote: http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html What would you like to say about it? If it is faster (and at least as good), maybe we'd like to adopt it to replace that used for mtrand It's a variant of the standard MT rather than just an implementation of it, so we can't just drop it in. You will need to build the infrastructure to support multiple PRNGs first (or rather, build the infrastructure to reuse the non-uniform distribution code with multiple core PRNGs). You mean it's not backward compatible because it won't generate exactly the same sequence of output for a given seed, and therefore we wouldn't want to make that change? I think it's somewhat debatable whether generating a different sequence of random numbers counts as breaking backward compatibility. It's not a matter of debate over the semantics of the term backwards compatibility. This is a policy that we have explicitly chosen for numpy.random (distinct from our backwards compatibility policy for the rest of numpy) because it was requested of us. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 09/05/2014 01:19 PM, Neal Becker wrote: You mean it's not backward compatible because it won't generate exactly the same sequence of output for a given seed, and therefore we wouldn't want to make that change? I think it's somewhat debatable whether generating a different sequence of random numbers counts as breaking backward compatibility. For regression tests it's essential to be able to generate the same sequence of values from a pseudo-random number generator. Phil ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 9/5/2014 1:19 PM, Neal Becker wrote: I think it's somewhat debatable whether generating a different sequence of random numbers counts as breaking backward compatibility. Please: it does. Alan Isaac ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On 05/09/14 19:19, Neal Becker wrote: It's a variant of the standard MT rather than just an implementation of it, so we can't just drop it in. You will need to build the infrastructure to support multiple PRNGs first (or rather, build the infrastructure to reuse the non-uniform distribution code with multiple core PRNGs). You mean it's not backward compatible because it won't generate exactly the same sequence of output for a given seed, and therefore we wouldn't want to make that change? I thought he meant that the standard MT and SFMT have global states, whereas NumPy's randomkit MT does not. Because of this, NumPy allows you to use multiple RandomState instances. With a vanilla MT or SFMT there would be just one. Sturla ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Fri, Sep 5, 2014 at 9:36 PM, Sturla Molden sturla.mol...@gmail.com wrote: On 05/09/14 19:19, Neal Becker wrote: It's a variant of the standard MT rather than just an implementation of it, so we can't just drop it in. You will need to build the infrastructure to support multiple PRNGs first (or rather, build the infrastructure to reuse the non-uniform distribution code with multiple core PRNGs). You mean it's not backward compatible because it won't generate exactly the same sequence of output for a given seed, and therefore we wouldn't want to make that change? I thought he meant that the standard MT and SFMT have global states, whereas NumPy's randomkit MT does not. Because of this, NumPy allows you to use multiple RandomState instances. With a vanilla MT or SFMT there would be just one. No, that is not what I meant. If the SFMT can be made to output the same bitstream for the same seed, we can use it (modifying it if necessary to avoid global state if necessary), but it does not look so to me. I welcome corrections on that count (in PR form, preferably!). -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
Robert Kern robert.k...@gmail.com wrote: No, that is not what I meant. If the SFMT can be made to output the same bitstream for the same seed, we can use it (modifying it if necessary to avoid global state if necessary), but it does not look so to me. I welcome corrections on that count (in PR form, preferably!). Saito's SFMT master thesis says SFMT as a different equidistribution, so it will not give the same bitstream. But it also mentions a SIMD version of the standard MT. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] SFMT (faster mersenne twister)
http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html -- -- Those who don't understand recursion are doomed to repeat it ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] SFMT (faster mersenne twister)
On Thu, Sep 4, 2014 at 12:32 PM, Neal Becker ndbeck...@gmail.com wrote: http://www.math.sci.hiroshima-u.ac.jp/~%20m-mat/MT/SFMT/index.html What would you like to say about it? -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion