Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-04 Thread Antoine Pitrou


Hi,

Do you plan to consider trying to add PEP 574 / pickle5 support? There's
an implementation ready (and a PyPI backport) that you can play with.
https://www.python.org/dev/peps/pep-0574/

PEP 574 implicits targets Numpy arrays as one of its primary producers,
since Numpy arrays is how large scientific or numerical data often ends
up represented and where zero-copy is often desired by users.

PEP 574 could certainly be useful even without Numpy arrays supporting
it, but less so.  So I would welcome any feedback on that front (and,
given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd
ideally like to have that feedback sometimes in the forthcoming months ;-)).

Best regards

Antoine.


On Thu, 31 May 2018 16:50:02 -0700
Matti Picus  wrote:
> At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> we spent some time brainstorming about a roadmap for NumPy, in the
> spirit of similar work that was done for Jupyter. The idea is that a
> document with wide community acceptance can guide the work of the
> full-time developer(s), and be a source of ideas for expanding
> development efforts.
>
> I put the document up at
> https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> it at a BOF session during SciPy in the middle of July in Austin.
>
> Eventually it could become a NEP or formalized in another way.
>
> Matti
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-04 Thread Kevin Sheppard
I’m not sure if this is within the scope of the NEP or an implementation 
detail, but I think a new PRNG should use platform independent integer types 
rather than depending on the platform’s choice of 64-bit data model.  This 
should be enough to ensure that any integer distribution that only uses 
integers internally should produce identical results across uarch/OS.


___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-04 Thread josef . pktd
On Mon, Jun 4, 2018 at 2:22 AM, Robert Kern  wrote:

> On Sun, Jun 3, 2018 at 10:27 PM  wrote:
>
>>
>>
>> On Mon, Jun 4, 2018 at 12:53 AM, Stephan Hoyer  wrote:
>>
>>> On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers 
>>> wrote:
>>>
 It may be worth having a look at test suites for scipy, statsmodels,
 scikit-learn, etc. and estimate how much work this NEP causes those
 projects. If the devs of those packages are forced to do large scale
 migrations from RandomState to StableState, then why not instead keep
 RandomState and just add a new API next to it?

>>>
>>> Tests that explicitly create RandomState objects would not be difficult
>>> to migrate. The goal of "StableState" is that it could be used directly in
>>> cases where RandomState is current used in tests, so I would guess that
>>> "RandomState" could be almost mechanistically replaced by "StableState".
>>>
>>> The challenging case are calls to np.random.seed(). If no replacement
>>> API is planned, then these would need to be manually converted to use
>>> StableState instead. This is probably not too onerous (and is a good
>>> cleanup to do anyways) but it would be a bit of work.
>>>
>>
>> I agree with this. Statsmodels uses mostly np.random.seed. That cleanup
>> is planned, but postponed so far as not high priority. We will have to do
>> it eventually.
>>
>> The main work will come when StableState doesn't include specific
>> distribution, Poisson, NegativeBinomial, Gamma, ... and distributions that
>> we don't even use yet, like Beta.
>>
>
> I would posit that it is probably very rare that one uses the full breadth
> of distributions in unit tests. You may be the only one. :-)
>

Given that I'm one of the maintainers for Statistics in Python, I wouldn't
be surprised if I would use more than almost all others.
However, statsmodels doesn't use a very large set, there are other packages
that use Pareto and Extreme Value distributions or circular distributions
like vonmises which are not yet in statsmodels. I have no idea about
whether MCMC packages still rely on numpy.random.

But the main "user" of numpy's random is scipy.stats which might be using
almost all of the distributions. I don't have a current overview about how
much scipy.stats unit tests rely on having stable streams for the available
distributions.



>
>
>> I don't want to migrate random number generation for the distributions
>> abandoned by numpy Stable to statsmodels.
>>
>
> What if we followed Kevin's suggestion and forked off RandomState into its
> own forever-frozen package sooner rather than later? It's intended use
> would be for people with legacy packages that cannot upgrade (other than
> changing some imports) and for unit tests that require precise streams for
> a full breadth of distributions. We would still leave it in numpy.random
> for a deprecation period, but maybe we would be noisy about it sooner and
> remove it sooner than my NEP planned for.
>
> Would that work? I'd be happy to maintain that forked-RandomState for you.
>

It would not be nice to have to add another dependency, but that would work
for statsmodels.

I'm not sure whether scipy.stats maintainers are fine with it. Given that
scipy already uses RandomState instead of the global instance, the actual
change if distributions are available would be to swap a StableState for a
RandomState in the unit tests, AFAIK.

Josef



>
> I would probably still encourage most people to continue to use
> StableRandom for most unit testing.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-04 Thread Matthew Harrigan
Should there be discussion of typing (pep-484) or abstract base classes in
this nep?  Are there any requirements on the result returned by
__array_function__?

On Mon, Jun 4, 2018, 2:20 AM Stephan Hoyer  wrote:

>
> On Sun, Jun 3, 2018 at 9:54 PM Hameer Abbasi 
> wrote:
>
>> Mixed return values of NotImplementedButCoercible and NotImplemented
>> would still result in TypeError, and there would be no second chances for
>> overloads.
>>
>>
>> I would like to differ with you here: It can be quite useful to have
>> second chances for overloads. Think ``np.func(list, custom_array))``: If
>> second rounds did not exist, custom_array would need to have a list of
>> coercible types (which is not nice IMO).
>>
>
> Even if we did this, we would still want to preserve the equivalence
> between:
> 1. Returning NotImplementedButCoercible from __array_ufunc__ or
> __array_function__, and
> 2. Not implementing __array_ufunc__ or __array_function__ at all.
>
> Changing __array_ufunc__ to do multiple rounds of checks could indeed be
> useful in some cases, and you're right that it would not change existing
> behavior (in these cases we currently raise TypeError). But I'd rather
> leave that for a separate discussion, because it's orthogonal to our
> proposal here for __array_function__.
>
> (Personally, I don't think it would be worth the additional complexity.)
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-04 Thread Marten van Kerkwijk
Hi Stephan,

Another potential consideration in favor of NotImplementedButCoercible is
> for subclassing: we could use it to write the default implementations of
> ndarray.__array_ufunc__ and ndarray.__array_function__, e.g.,
>
> class ndarray:
> def __array_ufunc__(self, *args, **kwargs):
> return NotIImplementedButCoercible
> def __array_function__(self, *args, **kwargs):
> return NotIImplementedButCoercible
>
> I think (not 100% sure yet) this would result in exactly equivalent
> behavior to what ndarray.__array_ufunc__ currently does:
> http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#
> subclass-hierarchies
>

As written would not work for ndarray subclasses, because the subclass will
generically change itself before calling super. At least for Quantity, say
if I add two quantities, the quantities will both be converted to arrays
(with one scaled so that the units match) and then the super call is done
with those modified arrays. This expects that the super call will actually
return a result (which it now can because all inputs are arrays).

But I think it would work to return `NotImplementedButCoercible` in the
case that perhaps you had in mind in the first place, in which any of the
*other* arguments had a `__array_ufunc__` implementation and `ndarray` thus
does not know what to do. For those cases, `ndarray` currently returns a
straight `NotImplemented`.

Though I am still a bit worried: this gets back to
`Quantity.__array_ufunc__`, but what does it do with it? It cannot just
pass it on, since then it is effectively telling, incorrectly, that the
*quantity* is coercible, which it is not. I guess at this point it would
have to change it to `NotImplemented`. Looking at my current
implementation, I see that if we made this change to
`ndarray.__array_ufunc__`, the implementation would mostly raise an
exception as it tried to view `NotImplementedButCoercible` as a quantity,
except for comparisons, where the output is not viewed at all (being
boolean and thus unit-less) and passed straight down. That said, we've said
the __array_ufunc__ implementation is experimental, so I think such small
annoyances are OK.

Overall, it is an intriguing idea, and I think it should be mentioned at
least in the NEP. It would be good, though, to have a few more examples of
how it would work in practice.

All the best,

Marten
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Dispatch Mechanism for NumPy’s high level API

2018-06-04 Thread Marten van Kerkwijk
I agree that second rounds of overloads have to be left to the implementers
of `__array_function__` - obviously, though, we should be sure that these
rounds are rarely necessary...  The link posted by Stephan [1] has some
decent discussion for `__array_ufunc__` about when an override should
re-call the function rather than try to do something itself.

-- Marten

[1]
http://www.numpy.org/neps/nep-0013-ufunc-overrides.html#subclass-hierarchies
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] A roadmap for NumPy - longer term planning

2018-06-04 Thread Stephan Hoyer
PEP-574 isn't on the roadmap (yet!), but I think we would clearly welcome
it. Like all NumPy improvements, it would need to implemented by an
interested party.
On Mon, Jun 4, 2018 at 1:52 AM Antoine Pitrou  wrote:

>
> Hi,
>
> Do you plan to consider trying to add PEP 574 / pickle5 support? There's
> an implementation ready (and a PyPI backport) that you can play with.
> https://www.python.org/dev/peps/pep-0574/
>
> PEP 574 implicits targets Numpy arrays as one of its primary producers,
> since Numpy arrays is how large scientific or numerical data often ends
> up represented and where zero-copy is often desired by users.
>
> PEP 574 could certainly be useful even without Numpy arrays supporting
> it, but less so.  So I would welcome any feedback on that front (and,
> given that I'd like PEP 574 to be accepted in time for Python 3.8, I'd
> ideally like to have that feedback sometimes in the forthcoming months
> ;-)).
>
> Best regards
>
> Antoine.
>
>
> On Thu, 31 May 2018 16:50:02 -0700
> Matti Picus  wrote:
> > At the recent NumPy sprint at BIDS (thanks to those who made the trip)
> > we spent some time brainstorming about a roadmap for NumPy, in the
> > spirit of similar work that was done for Jupyter. The idea is that a
> > document with wide community acceptance can guide the work of the
> > full-time developer(s), and be a source of ideas for expanding
> > development efforts.
> >
> > I put the document up at
> > https://github.com/numpy/numpy/wiki/NumPy-Roadmap, and hope to discuss
> > it at a BOF session during SciPy in the middle of July in Austin.
> >
> > Eventually it could become a NEP or formalized in another way.
> >
> > Matti
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-04 Thread Robert Kern
On Mon, Jun 4, 2018 at 2:55 AM Kevin Sheppard 
wrote:

> I’m not sure if this is within the scope of the NEP or an implementation
> detail, but I think a new PRNG should use platform independent integer
> types rather than depending on the platform’s choice of 64-bit data model.
> This should be enough to ensure that any integer distribution that only
> uses integers internally should produce identical results across uarch/OS.
>

Probably an implementation detail (possibly one that ought to be worked out
in its own NEP).

I know that I would like it if the new system had all of the same
distribution methods as RandomState currently does, such that we can drop
in the new generator objects in places where RandomState is currently used,
and everything would still work (just with a different stream). Might want
to add a statement to that effect in this NEP. I think it's likely "good
enough" if the integer distributions now return uint64 arrays instead of
uint32 arrays on Windows.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP: Random Number Generator Policy

2018-06-04 Thread Robert Kern
On Sun, Jun 3, 2018 at 8:22 PM Ralf Gommers  wrote:

> It may be worth having a look at test suites for scipy, statsmodels,
> scikit-learn, etc. and estimate how much work this NEP causes those
> projects. If the devs of those packages are forced to do large scale
> migrations from RandomState to StableState, then why not instead keep
> RandomState and just add a new API next to it?
>

The problem is that we can't really have an ecosystem with two different
general purpose systems. To properly use pseudorandom numbers, I need to
instantiate a PRNG and thread it through all of the code in my program:
both the parts that I write and the third party libraries that I don't
write.

Generating test data for unit tests is separable, though. That's why I
propose having a StableRandom built on the new architecture. Its purpose
would be well-documented, and in my proposal is limited in features such
that it will be less likely to be abused outside of that purpose. If you
make it fully-featured, it is more likely to be abused by building library
code around it. But even if it is so abused, because it is built on the new
architecture, at least I can thread the same core PRNG state through the
StableRandom distributions from the abusing library and use the better
distributions class elsewhere (randomgen names it "Generator"). Just
keeping RandomState around can't work like that because it doesn't have a
replaceable core PRNG.

But that does suggest another alternative that we should explore:

The new architecture separates the core uniform PRNG from the wide variety
of non-uniform probability distributions. That is, the core PRNG state is
encapsulated in a discrete object that can be shared between instances of
different distribution-providing classes. numpy.random should provide two
such distribution-providing classes. The main one (let us call it
``Generator``, as it is called in the prototype) will follow the new
policy: distribution methods can break the stream in feature releases.
There will also be a secondary distributions class (let us call it
``LegacyGenerator``) which contains distribution methods exactly as they
exist in the current ``RandomState`` implementation. When one combines
``LegacyGenerator`` with the MT19937 core PRNG, it should reproduce the
exact same stream as ``RandomState`` for all distribution methods. The
``LegacyGenerator`` methods will be forever frozen.
``numpy.random.RandomState()`` will instantiate a ``LegacyGenerator`` with
the MT19937 core PRNG, and whatever tricks needed to make
``isinstance(prng, RandomState)`` and unpickling work should be done. This
way of creating the ``LegacyGenerator`` by way of ``RandomState`` will be
deprecated, becoming progressively noisier over a number of release cycles,
in favor of explicitly instantiating ``LegacyGenerator``.

``LegacyGenerator`` CAN be used during this deprecation period in library
and application code until libraries and applications can migrate to the
new ``Generator``. Libraries and applications SHOULD migrate but MUST NOT
be forced to. ``LegacyGenerator`` CAN be used to generate test data for
unit tests where cross-release stability of the streams is important. Test
writers SHOULD consider ways to mitigate their reliance on such stability
and SHOULD limit their usage to distribution methods that have fewer
cross-platform stability risks.

-- 
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion