On Mon, Jun 4, 2018 at 3:18 PM, Robert Kern <robert.k...@gmail.com> wrote:
> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> It may be worth having a look at test suites for scipy, statsmodels, >> scikit-learn, etc. and estimate how much work this NEP causes those >> projects. If the devs of those packages are forced to do large scale >> migrations from RandomState to StableState, then why not instead keep >> RandomState and just add a new API next to it? >> > > The problem is that we can't really have an ecosystem with two different > general purpose systems. > Can't = prefer not to. But yes, that's true. That's not what I was saying though. We want one generic one, and one meant for unit testing only. You can achieve that in two ways: 1. Change the current np.random API to new generic, and add a new RandomStable for unit tests. 2. Add a new generic API, and document the current np.random API as being meant for unit tests only, for other usage <new API> should be preferred. (2) has a couple of pros: - you're not forcing almost every library and end user out there to migrate their unit tests. - more design freedom for the new generic API. The current one is clearly sub-optimal; in a new one you wouldn't have to expose all the global state/functions that np.random exposes now. You could even restrict it to a single class and put that in the main numpy namespace. Ralf To properly use pseudorandom numbers, I need to instantiate a PRNG and > thread it through all of the code in my program: both the parts that I > write and the third party libraries that I don't write. > > Generating test data for unit tests is separable, though. That's why I > propose having a StableRandom built on the new architecture. Its purpose > would be well-documented, and in my proposal is limited in features such > that it will be less likely to be abused outside of that purpose. If you > make it fully-featured, it is more likely to be abused by building library > code around it. But even if it is so abused, because it is built on the new > architecture, at least I can thread the same core PRNG state through the > StableRandom distributions from the abusing library and use the better > distributions class elsewhere (randomgen names it "Generator"). Just > keeping RandomState around can't work like that because it doesn't have a > replaceable core PRNG. > > But that does suggest another alternative that we should explore: > > The new architecture separates the core uniform PRNG from the wide variety > of non-uniform probability distributions. That is, the core PRNG state is > encapsulated in a discrete object that can be shared between instances of > different distribution-providing classes. numpy.random should provide two > such distribution-providing classes. The main one (let us call it > ``Generator``, as it is called in the prototype) will follow the new > policy: distribution methods can break the stream in feature releases. > There will also be a secondary distributions class (let us call it > ``LegacyGenerator``) which contains distribution methods exactly as they > exist in the current ``RandomState`` implementation. When one combines > ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the > exact same stream as ``RandomState`` for all distribution methods. The > ``LegacyGenerator`` methods will be forever frozen. > ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with > the MT19937 core PRNG, and whatever tricks needed to make > ``isinstance(prng, RandomState)`` and unpickling work should be done. This > way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be > deprecated, becoming progressively noisier over a number of release cycles, > in favor of explicitly instantiating ``LegacyGenerator``. > > ``LegacyGenerator`` CAN be used during this deprecation period in library > and application code until libraries and applications can migrate to the > new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT > be forced to. ``LegacyGenerator`` CAN be used to generate test data for > unit tests where cross-release stability of the streams is important. Test > writers SHOULD consider ways to mitigate their reliance on such stability > and SHOULD limit their usage to distribution methods that have fewer > cross-platform stability risks. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion