Re: [Numpy-discussion] A roadmap for NumPy - longer term planning
Hi, Do you plan to consider trying to add PEP 574 / pickle5 support? There's an implementation ready (and a PyPI backport) that you can play with. https://www.python.org/dev/peps/pep-0574/ PEP 574 implicits targets Numpy arrays as one of its primary producers, since Numpy arrays is how large scientific or numerical data often ends up represented and where zero-copy is often desired by users. PEP 574 could certainly be useful even without Numpy arrays supporting it, but less so. So I would welcome any feedback on that front (and, given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd ideally like to have that feedback sometimes in the forthcoming months ;-)). Best regards Antoine. On Thu, 31 May 2018 16:50:02 -0700 Matti Picus wrote: > At the recent NumPy sprint at BIDS (thanks to those who made the trip) > we spent some time brainstorming about a roadmap for NumPy, in the > spirit of similar work that was done for Jupyter. The idea is that a > document with wide community acceptance can guide the work of the > full-time developer(s), and be a source of ideas for expanding > development efforts. > > I put the document up at > https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss > it at a BOF session during SciPy in the middle of July in Austin. > > Eventually it could become a NEP or formalized in another way. > > Matti ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Random Number Generator Policy
I’m not sure if this is within the scope of the NEP or an implementation detail, but I think a new PRNG should use platform independent integer types rather than depending on the platform’s choice of 64-bit data model. This should be enough to ensure that any integer distribution that only uses integers internally should produce identical results across uarch/OS. ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Random Number Generator Policy
On Mon, Jun 4, 2018 at 2:22 AM, Robert Kern wrote: > On Sun, Jun 3, 2018 at 10:27 PM wrote: > >> >> >> On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer wrote: >> >>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers >>> wrote: >>> It may be worth having a look at test suites for scipy, statsmodels, scikit-learn, etc. and estimate how much work this NEP causes those projects. If the devs of those packages are forced to do large scale migrations from RandomState to StableState, then why not instead keep RandomState and just add a new API next to it? >>> >>> Tests that explicitly create RandomState objects would not be difficult >>> to migrate. The goal of "StableState" is that it could be used directly in >>> cases where RandomState is current used in tests, so I would guess that >>> "RandomState" could be almost mechanistically replaced by "StableState". >>> >>> The challenging case are calls to np.random.seed(). If no replacement >>> API is planned, then these would need to be manually converted to use >>> StableState instead. This is probably not too onerous (and is a good >>> cleanup to do anyways) but it would be a bit of work. >>> >> >> I agree with this. Statsmodels uses mostly np.random.seed. That cleanup >> is planned, but postponed so far as not high priority. We will have to do >> it eventually. >> >> The main work will come when StableState doesn't include specific >> distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that >> we don't even use yet, like Beta. >> > > I would posit that it is probably very rare that one uses the full breadth > of distributions in unit tests. You may be the only one. :-) > Given that I'm one of the maintainers for Statistics in Python, I wouldn't be surprised if I would use more than almost all others. However, statsmodels doesn't use a very large set, there are other packages that use Pareto and Extreme Value distributions or circular distributions like vonmises which are not yet in statsmodels. I have no idea about whether MCMC packages still rely on numpy.random. But the main "user" of numpy's random is scipy.stats which might be using almost all of the distributions. I don't have a current overview about how much scipy.stats unit tests rely on having stable streams for the available distributions. > > >> I don't want to migrate random number generation for the distributions >> abandoned by numpy Stable to statsmodels. >> > > What if we followed Kevin's suggestion and forked off RandomState into its > own forever-frozen package sooner rather than later? It's intended use > would be for people with legacy packages that cannot upgrade (other than > changing some imports) and for unit tests that require precise streams for > a full breadth of distributions. We would still leave it in numpy.random > for a deprecation period, but maybe we would be noisy about it sooner and > remove it sooner than my NEP planned for. > > Would that work? I'd be happy to maintain that forked-RandomState for you. > It would not be nice to have to add another dependency, but that would work for statsmodels. I'm not sure whether scipy.stats maintainers are fine with it. Given that scipy already uses RandomState instead of the global instance, the actual change if distributions are available would be to swap a StableState for a RandomState in the unit tests, AFAIK. Josef > > I would probably still encourage most people to continue to use > StableRandom for most unit testing. > > -- > Robert Kern > > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API
Should there be discussion of typing (pep-484) or abstract base classes in this nep? Are there any requirements on the result returned by __array_function__? On Mon, Jun 4, 2018, 2:20 AM Stephan Hoyer wrote: > > On Sun, Jun 3, 2018 at 9:54 PM Hameer Abbasi > wrote: > >> Mixed return values of NotImplementedButCoercible and NotImplemented >> would still result in TypeError, and there would be no second chances for >> overloads. >> >> >> I would like to differ with you here: It can be quite useful to have >> second chances for overloads. Think ``np.func(list, custom_array))``: If >> second rounds did not exist, custom_array would need to have a list of >> coercible types (which is not nice IMO). >> > > Even if we did this, we would still want to preserve the equivalence > between: > 1. Returning NotImplementedButCoercible from __array_ufunc__ or > __array_function__, and > 2. Not implementing __array_ufunc__ or __array_function__ at all. > > Changing __array_ufunc__ to do multiple rounds of checks could indeed be > useful in some cases, and you're right that it would not change existing > behavior (in these cases we currently raise TypeError). But I'd rather > leave that for a separate discussion, because it's orthogonal to our > proposal here for __array_function__. > > (Personally, I don't think it would be worth the additional complexity.) > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API
Hi Stephan, Another potential consideration in favor of NotImplementedButCoercible is > for subclassing: we could use it to write the default implementations of > ndarray.__array_ufunc__ and ndarray.__array_function__, e.g., > > class ndarray: > def __array_ufunc__(self, *args, **kwargs): > return NotIImplementedButCoercible > def __array_function__(self, *args, **kwargs): > return NotIImplementedButCoercible > > I think (not 100% sure yet) this would result in exactly equivalent > behavior to what ndarray.__array_ufunc__ currently does: > http://www.numpy.org/neps/nep-0013-ufunc-overrides.html# > subclass-hierarchies > As written would not work for ndarray subclasses, because the subclass will generically change itself before calling super. At least for Quantity, say if I add two quantities, the quantities will both be converted to arrays (with one scaled so that the units match) and then the super call is done with those modified arrays. This expects that the super call will actually return a result (which it now can because all inputs are arrays). But I think it would work to return `NotImplementedButCoercible` in the case that perhaps you had in mind in the first place, in which any of the *other* arguments had a `__array_ufunc__` implementation and `ndarray` thus does not know what to do. For those cases, `ndarray` currently returns a straight `NotImplemented`. Though I am still a bit worried: this gets back to `Quantity.__array_ufunc__`, but what does it do with it? It cannot just pass it on, since then it is effectively telling, incorrectly, that the *quantity* is coercible, which it is not. I guess at this point it would have to change it to `NotImplemented`. Looking at my current implementation, I see that if we made this change to `ndarray.__array_ufunc__`, the implementation would mostly raise an exception as it tried to view `NotImplementedButCoercible` as a quantity, except for comparisons, where the output is not viewed at all (being boolean and thus unit-less) and passed straight down. That said, we've said the __array_ufunc__ implementation is experimental, so I think such small annoyances are OK. Overall, it is an intriguing idea, and I think it should be mentioned at least in the NEP. It would be good, though, to have a few more examples of how it would work in practice. All the best, Marten ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API
I agree that second rounds of overloads have to be left to the implementers of `__array_function__` - obviously, though, we should be sure that these rounds are rarely necessary... The link posted by Stephan [1] has some decent discussion for `__array_ufunc__` about when an override should re-call the function rather than try to do something itself. -- Marten [1] http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] A roadmap for NumPy - longer term planning
PEP-574 isn't on the roadmap (yet!), but I think we would clearly welcome it. Like all NumPy improvements, it would need to implemented by an interested party. On Mon, Jun 4, 2018 at 1:52 AM Antoine Pitrou wrote: > > Hi, > > Do you plan to consider trying to add PEP 574 / pickle5 support? There's > an implementation ready (and a PyPI backport) that you can play with. > https://www.python.org/dev/peps/pep-0574/ > > PEP 574 implicits targets Numpy arrays as one of its primary producers, > since Numpy arrays is how large scientific or numerical data often ends > up represented and where zero-copy is often desired by users. > > PEP 574 could certainly be useful even without Numpy arrays supporting > it, but less so. So I would welcome any feedback on that front (and, > given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd > ideally like to have that feedback sometimes in the forthcoming months > ;-)). > > Best regards > > Antoine. > > > On Thu, 31 May 2018 16:50:02 -0700 > Matti Picus wrote: > > At the recent NumPy sprint at BIDS (thanks to those who made the trip) > > we spent some time brainstorming about a roadmap for NumPy, in the > > spirit of similar work that was done for Jupyter. The idea is that a > > document with wide community acceptance can guide the work of the > > full-time developer(s), and be a source of ideas for expanding > > development efforts. > > > > I put the document up at > > https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss > > it at a BOF session during SciPy in the middle of July in Austin. > > > > Eventually it could become a NEP or formalized in another way. > > > > Matti > ___ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Random Number Generator Policy
On Mon, Jun 4, 2018 at 2:55 AM Kevin Sheppard wrote: > I’m not sure if this is within the scope of the NEP or an implementation > detail, but I think a new PRNG should use platform independent integer > types rather than depending on the platform’s choice of 64-bit data model. > This should be enough to ensure that any integer distribution that only > uses integers internally should produce identical results across uarch/OS. > Probably an implementation detail (possibly one that ought to be worked out in its own NEP). I know that I would like it if the new system had all of the same distribution methods as RandomState currently does, such that we can drop in the new generator objects in places where RandomState is currently used, and everything would still work (just with a different stream). Might want to add a statement to that effect in this NEP. I think it's likely "good enough" if the integer distributions now return uint64 arrays instead of uint32 arrays on Windows. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP: Random Number Generator Policy
On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers wrote: > It may be worth having a look at test suites for scipy, statsmodels, > scikit-learn, etc. and estimate how much work this NEP causes those > projects. If the devs of those packages are forced to do large scale > migrations from RandomState to StableState, then why not instead keep > RandomState and just add a new API next to it? > The problem is that we can't really have an ecosystem with two different general purpose systems. To properly use pseudorandom numbers, I need to instantiate a PRNG and thread it through all of the code in my program: both the parts that I write and the third party libraries that I don't write. Generating test data for unit tests is separable, though. That's why I propose having a StableRandom built on the new architecture. Its purpose would be well-documented, and in my proposal is limited in features such that it will be less likely to be abused outside of that purpose. If you make it fully-featured, it is more likely to be abused by building library code around it. But even if it is so abused, because it is built on the new architecture, at least I can thread the same core PRNG state through the StableRandom distributions from the abusing library and use the better distributions class elsewhere (randomgen names it "Generator"). Just keeping RandomState around can't work like that because it doesn't have a replaceable core PRNG. But that does suggest another alternative that we should explore: The new architecture separates the core uniform PRNG from the wide variety of non-uniform probability distributions. That is, the core PRNG state is encapsulated in a discrete object that can be shared between instances of different distribution-providing classes. numpy.random should provide two such distribution-providing classes. The main one (let us call it ``Generator``, as it is called in the prototype) will follow the new policy: distribution methods can break the stream in feature releases. There will also be a secondary distributions class (let us call it ``LegacyGenerator``) which contains distribution methods exactly as they exist in the current ``RandomState`` implementation. When one combines ``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the exact same stream as ``RandomState`` for all distribution methods. The ``LegacyGenerator`` methods will be forever frozen. ``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with the MT19937 core PRNG, and whatever tricks needed to make ``isinstance(prng, RandomState)`` and unpickling work should be done. This way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be deprecated, becoming progressively noisier over a number of release cycles, in favor of explicitly instantiating ``LegacyGenerator``. ``LegacyGenerator`` CAN be used during this deprecation period in library and application code until libraries and applications can migrate to the new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT be forced to. ``LegacyGenerator`` CAN be used to generate test data for unit tests where cross-release stability of the streams is important. Test writers SHOULD consider ways to mitigate their reliance on such stability and SHOULD limit their usage to distribution methods that have fewer cross-platform stability risks. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion