On Fri, Jan 19, 2018 at 6:55 AM, Robert Kern <robert.k...@gmail.com> wrote: [...] > There seems to be a lot of pent-up motivation to improve on the random > number generation, in particular the distributions, that has been blocked by > our policy. I think we've lost a few potential first-time contributors that > have run up against this wall. We have been pondering ways to allow for > adding new core PRNGs and improve the distribution methods while maintaining > stream-compatibility for existing code. Kevin Sheppard, in particular, has > been working hard to implement new core PRNGs with a common API. > > https://github.com/bashtage/ng-numpy-randomstate > > Kevin has also been working to implement the several proposals that have > been made to select different versions of distribution implementations. In > particular, one idea is to pass something to the RandomState constructor to > select a specific version of distributions (or switch out the core PRNG). > Note that to satisfy the policy, the simplest method of seeding a > RandomState will always give you the oldest version: what we have now. > > Kevin has recently come to the conclusion that it's not technically feasible > to add the version-selection at all if we keep the stream-compatibility > policy. > > https://github.com/numpy/numpy/pull/10124#issuecomment-350876221 > > I would argue that our current policy isn't providing the value that it > claims to.
I agree that relaxing our policy would be better than the status quo. Before making any decisions, though, I'd like to make sure we understand the alternatives and their trade-offs. Specifically, I think the main alternative would be the following approach to versioning: 1) make RandomState's state be a tuple (underlying RNG algorithm, underlying RNG state, distribution version) 2) zero-argument initialization/seeding, like RandomState() or rstate.seed(), sets the state to: (our recommended RNG algorithm, os.urandom(...), version=LATEST_VERSION) 3) for backcompat, single-argument seeding like RandomState(123) or rstate.seed(123), sets the state to: (mersenne twister, expand_mt_seed(123), version=0) 4) also allow seeding to explicitly control all the parameters, like RandomState(PCG_XSL_RR(123), version=12) or whatever 5) the distribution functions are implemented like: def normal(*args, **kwargs): if self.version < 3: return self._normal_box_muller(*args, **kwargs) elif self.version < 8: return self._normal_ziggurat_v1(*args, **kwargs) else: # version >= 8 return self._normal_ziggurat_v2(*args, **kwargs) Advantages: fully backwards compatible; preserves the compatibility guarantee (such as it is); users who use the default seeding automatically get the highest speed and quality Disadvantages: users who specify seeds explicitly get old/slow distributions (but of course that's the point of compatibility); we have to keep the old distribution code around forever (but this is not too hard; it just sits in some function and we never touch it). Kevin, is this the version that you think is non-viable? Is the above a good description of the advantages/disadvantages? -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion