Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Nathaniel Smith
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers  wrote:
> On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith  wrote:
>> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi  
>> wrote:
>> > The fact that we're having to design more and more protocols for a lot
>> > of very similar things is, to me, an indicator that we do have holistic
>> > problems that ought to be solved by a single protocol.
>>
>> But the reason we've had trouble designing these protocols is that
>> they're each different :-). If it was just a matter of copying
>> __array_ufunc__ we'd have been done in a few minutes...
>
> I don't think that argument is correct. That we now have two very similar 
> protocols is simply a matter of history and limited developer time. NEP 18 
> discusses in several places that __array_ufunc__ should be brought in line 
> with __array_ufunc__, and that we can migrate a function from one protocol to 
> the other. There's no technical reason other than backwards compat and dev 
> time why we couldn't use __array_function__ for ufuncs also.

Huh, that's interesting! Apparently we have a profoundly different
understanding of what we're doing here. To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say
__array_ufunc__ is a good idea and __array_function__ is a bad idea,
and would definitely not be in favor of combining them together.

The key difference is that __array_ufunc__ allows for *generic*
implementations. Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even
new third-party ufuncs that the duck array library has never heard of,
because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box.
For example:

- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
- xarray can do its label-based axis matching, and then invoke the
core operation.
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
- sparse arrays can check the ufunc .identity attribute to find out
whether 0 is an identity, and if so invoke the operation directly on
the non-zero entries; otherwise, it can loop over the array and
densify it in blocks and invoke the core operation on each. (It would
be useful to have a bit more metadata on the ufunc, so e.g.
np.subtract could declare that zero is a right-identity but not a
left-identity, but that's a simple enough extension to make at some
point.)

Result: __array_ufunc__ makes it totally possible to take a ufunc from
scipy.special or a random new on created with numba, and have it
immediately work on an xarray wrapped around dask wrapped around
bcolz, out-of-the-box. That's a clean, generic interface. [1]

OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.

To me, the whole point of interfaces is to reduce coupling. When you
have N interacting modules, it's unmaintainable if every change
requires considering every N! combination. From this perspective,
__array_function__ isn't good, but it is still somewhat constrained:
the result of each operation is still determined by the objects
involved, nothing else. In this regard, uarray even more extreme than
__array_function__, because arbitrary operations can be arbitrarily
changed by arbitrarily distant code. It sort of feels like the
argument for uarray is: well, designing maintainable interfaces is a
lot of work, so forget it, let's just make it easy to monkeypatch
everything and call it a day.

That said, in my replies in this thread I've been trying to stay
productive and focus on narrower concrete issues. I'm pretty sure that
__array_function__ and uarray will turn out to be bad ideas and will
fail, but that's not a proven fact, it's just an informed guess. And
the road that I favor also has lots of risks and uncertainty. So I
don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that
uarray solves the protocol design problems we've been talking about.

-n

[1] There are also some cases that __array_ufunc__ doesn't handle as
nicely. One obvious one is that GPU/TPU libraries still need to
special-case individual ufuncs. But that's not a limitation of
__array_ufunc__, it's a limitation of GPUs – they can't run CPU code,
so they can't use the CPU implementation of the core operations.
Another limitation is that __array_ufunc__ is weak at handling
operations that involve mixed libraries (e.g. np.add(bcolz_array,
sparse_array)) – to work well, this might require that bcolz have
special-case handling for sparse arrays, or vice-versa, so you still
potentially have some N**2 special cases, though at least here N is
the numb

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Hameer Abbasi

On 08.09.19 09:53, Nathaniel Smith wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers  
wrote:

On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith  wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi 
 wrote:

The fact that we're having to design more and more protocols for a lot
of very similar things is, to me, an indicator that we do have 
holistic

problems that ought to be solved by a single protocol.

But the reason we've had trouble designing these protocols is that
they're each different . If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very 
similar protocols is simply a matter of history and limited developer 
time. NEP 18 discusses in several places that __array_ufunc__ should 
be brought in line with __array_ufunc__, and that we can migrate a 
function from one protocol to the other. There's no technical reason 
other than backwards compat and dev time why we couldn't use 
__array_function__ for ufuncs also.

Huh, that's interesting! Apparently we have a profoundly different
understanding of what we're doing here. To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say
__array_ufunc__ is a good idea and __array_function__ is a bad idea,
and would definitely not be in favor of combining them together.

The key difference is that __array_ufunc__ allows for *generic*
implementations. Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even
new third-party ufuncs that the duck array library has never heard of,
because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box.
For example:

- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
- xarray can do its label-based axis matching, and then invoke the
core operation.
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
- sparse arrays can check the ufunc .identity attribute to find out
whether 0 is an identity, and if so invoke the operation directly on
the non-zero entries; otherwise, it can loop over the array and
densify it in blocks and invoke the core operation on each. (It would
be useful to have a bit more metadata on the ufunc, so e.g.
np.subtract could declare that zero is a right-identity but not a
left-identity, but that's a simple enough extension to make at some
point.)

Result: __array_ufunc__ makes it totally possible to take a ufunc from
scipy.special or a random new on created with numba, and have it
immediately work on an xarray wrapped around dask wrapped around
bcolz, out-of-the-box. That's a clean, generic interface. [1]

OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.


But uarray does allow this kind of simplification. You would do the 
following inside a uarray backend:


def __ua_function__(func, args, kwargs):
    with ua.skip_backend(self_backend):
    # Do code here, dispatches to everything but

This is possible today and is done in the dask backend inside unumpy for 
example.




To me, the whole point of interfaces is to reduce coupling. When you
have N interacting modules, it's unmaintainable if every change
requires considering every N! combination. From this perspective,
__array_function__ isn't good, but it is still somewhat constrained:
the result of each operation is still determined by the objects
involved, nothing else. In this regard, uarray even more extreme than
__array_function__, because arbitrary operations can be arbitrarily
changed by arbitrarily distant code. It sort of feels like the
argument for uarray is: well, designing maintainable interfaces is a
lot of work, so forget it, let's just make it easy to monkeypatch
everything and call it a day.

That said, in my replies in this thread I've been trying to stay
productive and focus on narrower concrete issues. I'm pretty sure that
__array_function__ and uarray will turn out to be bad ideas and will
fail, but that's not a proven fact, it's just an informed guess. And
the road that I favor also has lots of risks and uncertainty. So I
don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that
uarray solves the protocol design problems we've been talking about.

-n

[1] There are also some cases that __array_ufunc__ doesn't handle as
nicely. One obvious one is that GPU/TPU libraries still need to
special-case individual ufuncs. But that's not a limitation of
__array_ufunc__, it's a limitation of GPUs – they can't run CPU code,
so they can't use the CPU implementation of the core operations.
Another limit

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Nathaniel Smith
On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi  wrote:
>
> On 08.09.19 09:53, Nathaniel Smith wrote:
>> OTOH, __array_function__ doesn't allow this kind of simplification: if
>> we were using __array_function__ for ufuncs, every library would have
>> to special-case every individual ufunc, which leads to dramatically
>> more work and more potential for bugs.
>
> But uarray does allow this kind of simplification. You would do the following 
> inside a uarray backend:
>
> def __ua_function__(func, args, kwargs):
> with ua.skip_backend(self_backend):
> # Do code here, dispatches to everything but

You can dispatch to the underlying operation, sure, but you can't
implement a generic ufunc loop because you don't know that 'func' is
actually a bound ufunc method, or have any way to access the
underlying ufunc object. (E.g. consider the case where 'func' is
'np.add.reduce'.) The critical part of my example was that it's a new
ufunc that none of these libraries have ever heard of before.

Ufuncs have lot of consistent structure beyond what generic Python
callables have, and the whole point of __array_ufunc__ is that
implementors can rely on that structure. You get to work at a higher
level of abstraction.

A similar but simpler example would be the protocol we've sketched out
for concatenation: the idea would be to capture the core similarity
between 
np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any
other variants, so that implementors only have to worry about the
higher-level concept of "concatenation" rather than the raw APIs of
all those individual functions.

-n

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Hameer Abbasi

On 08.09.19 10:56, Nathaniel Smith wrote:

On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi  wrote:

On 08.09.19 09:53, Nathaniel Smith wrote:

OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.

But uarray does allow this kind of simplification. You would do the following 
inside a uarray backend:

def __ua_function__(func, args, kwargs):
 with ua.skip_backend(self_backend):
 # Do code here, dispatches to everything but

You can dispatch to the underlying operation, sure, but you can't
implement a generic ufunc loop because you don't know that 'func' is
actually a bound ufunc method, or have any way to access the
underlying ufunc object. (E.g. consider the case where 'func' is
'np.add.reduce'.) The critical part of my example was that it's a new
ufunc that none of these libraries have ever heard of before.

Ufuncs have lot of consistent structure beyond what generic Python
callables have, and the whole point of __array_ufunc__ is that
implementors can rely on that structure. You get to work at a higher
level of abstraction.

A similar but simpler example would be the protocol we've sketched out
for concatenation: the idea would be to capture the core similarity
between 
np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any
other variants, so that implementors only have to worry about the
higher-level concept of "concatenation" rather than the raw APIs of
all those individual functions.


There's a solution for that too: Default implementations. Implement 
concatenate, and you've got a default implementation for all of those 
you mentioned.


Similarly for transpose/swapaxis/moveaxis and family.



-n

-n



___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Ralf Gommers
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith  wrote:

> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers 
> wrote:
> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith  wrote:
> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi 
> wrote:
> >> > The fact that we're having to design more and more protocols for a lot
> >> > of very similar things is, to me, an indicator that we do have
> holistic
> >> > problems that ought to be solved by a single protocol.
> >>
> >> But the reason we've had trouble designing these protocols is that
> >> they're each different :-). If it was just a matter of copying
> >> __array_ufunc__ we'd have been done in a few minutes...
> >
> > I don't think that argument is correct. That we now have two very
> similar protocols is simply a matter of history and limited developer time.
> NEP 18 discusses in several places that __array_ufunc__ should be brought
> in line with __array_ufunc__, and that we can migrate a function from one
> protocol to the other. There's no technical reason other than backwards
> compat and dev time why we couldn't use __array_function__ for ufuncs also.
>
> Huh, that's interesting! Apparently we have a profoundly different
> understanding of what we're doing here.


That is interesting indeed. We should figure this out first - no point
discussing a NEP about plugging the gaps in our override system when we
don't have a common understanding of why we wanted/needed an override
system in the first place.

To me, __array_ufunc__ and
> __array_function__ are completely different. In fact I'd say
> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
>

It's early days, but "customer feedback" certainly has been more
enthusiastic for __array_function__. Also from what I've seen so far it
works well. Example: at the SciPy sprints someone put together Xarray plus
pydata/sparse to use distributed sparse arrays for visualizing some large
genetic (I think) data sets. That was made to work in a single day, with
impressively little code.

and would definitely not be in favor of combining them together.
>

I'm not saying we should. But __array_ufunc__ is basically a slight
specialization - knowing that the function that was called is a ufunc can
be handy but is usually irrelevant.


> The key difference is that __array_ufunc__ allows for *generic*
> implementations.


Implementations of what?

Most duck array libraries can write a single
> implementation of __array_ufunc__ that works for *all* ufuncs, even
> new third-party ufuncs that the duck array library has never heard of,
>

I see where you're going with this. You are thinking of reusing the ufunc
implementation to do a computation. That's a minor use case (imho), and I
can't remember seeing it used.

The original use case was scipy.sparse matrices. The executive summary of
NEP 13 talks about this. It's about calling
`np.some_ufunc(other_ndarray_like)` and "handing over control" to that
object rather than the numpy function starting to execute. Also note that
NEP 13 states in the summary "This covers some of the same ground as Travis
Oliphant’s proposal to retro-fit NumPy with multi-methods" (reminds one of
uarray).

For scipy.sparse, the layout of the data doesn't make sense to numpy. All
that was desired was that the sparse matrix needs to know what function was
called, so it can call its own implementation of that function instead.

because ufuncs all share the same structure of a loop wrapped around a
> core operation, and they can treat the core operation as a black box.
> For example:
>
> - Dask can split up the operation across its tiled sub-arrays, and
> then for each tile it invokes the core operation.
>

Works for __array_function__ too. Note, *not* by explicitly reusing the
numpy function. Dask anyway has its own functions that mirror the numpy
API. Dask's __array_function__ just does the forwarding to its own
functions.

Also, a Dask array could be a collection of CuPy arrays, and CuPy
implements __array_ufunc__. So explicitly reusing the NumPy ufunc
implementation on whatever comes in would be, well, not so nice.

- xarray can do its label-based axis matching, and then invoke the
> core operation.
>

Could do this with __array_function__ too

- bcolz can loop over the array uncompressing one block at a time,
> invoking the core operation on each.
>

not sure about this one

- sparse arrays can check the ufunc .identity attribute


this is case where knowing if something is a ufunc helps use a property of
it. so there the more specialized nature of __array_ufunc__ helps. Seems
niche though, and could probably also be done by checking if a function is
an instance of np.ufunc via __array_function__

to find out
> whether 0 is an identity, and if so invoke the operation directly on
> the non-zero entries; otherwise, it can loop over the array and
> densify it in blocks and invoke the core operation on each. (It would
> be useful to have a bit more metadata on the ufunc, so e.g.
> np.subtra

Re: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy

2019-09-08 Thread Warren Weckesser
On 9/4/19, Matthew Brett  wrote:
> Hi,
>
> Maybe worth asking over at the Pandas list?  I bet there are more
> Python / finance people over there.


OK, I sent a message to the PyData mailing list.

Warren


>
> Cheers,
>
> Matthew
>
> On Wed, Sep 4, 2019 at 7:11 PM Ilhan Polat  wrote:
>>
>> +1 on removing them from NumPy. I think there are plenty of alternatives
>> already so many that we might even consider deprecating them just like
>> SciPy misc module by pointing to alternatives.
>>
>> On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg 
>> wrote:
>>>
>>> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote:
>>> > Github issue 2880 ("Get financial functions out of main namespace",
>>>
>>> Very briefly, I am absolutely in favor of this.
>>>
>>> Keeping the functions in numpy seems more of a liability than help
>>> anyone. And this push is more likely to help users by spurring
>>> development on a good replacement, than a practically unmaintained
>>> corner of NumPy that may seem like it solves a problem, but probably
>>> does so very poorly.
>>>
>>> Moving them into a separate pip installable package seems like the best
>>> way forward until a better replacement, to which we can point users,
>>> comes up.
>>>
>>> - Sebastian
>>>
>>>
>>> > https://github.com/numpy/numpy/issues/2880) has been open since 2013.
>>> > In a recent community meeting, it was suggested that we create a NEP
>>> > to propose the removal of the financial functions from NumPy.  I have
>>> > submitted "NEP 32:  Remove the financial functions from NumPy" in a
>>> > pull request at https://github.com/numpy/numpy/pull/14399.  A copy of
>>> > the latest version of the NEP is below.
>>> >
>>> > According to the NEP process document, "Once the PR is in place, the
>>> > NEP should be announced on the mailing list for discussion (comments
>>> > on the PR itself should be restricted to minor editorial and
>>> > technical fixes)."  This email is the announcement for NEP 32.
>>> >
>>> > The NEP includes a brief summary of the history of the financial
>>> > functions, and has links to several relevant mailing list threads,
>>> > dating back to when the functions were added to NumPy in 2008.  I
>>> > recommend reviewing those threads before commenting here.
>>> >
>>> > Warren
>>> >
>>> > -
>>> >
>>> > ==
>>> > NEP 32 — Remove the financial functions from NumPy
>>> > ==
>>> >
>>> > :Author: Warren Weckesser 
>>> > :Status: Draft
>>> > :Type: Standards Track
>>> > :Created: 2019-08-30
>>> >
>>> >
>>> > Abstract
>>> > 
>>> >
>>> > We propose deprecating and ultimately removing the financial
>>> > functions [1]_
>>> > from NumPy.  The functions will be moved to an independent
>>> > repository,
>>> > and provided to the community as a separate package with the name
>>> > ``numpy_financial``.
>>> >
>>> >
>>> > Motivation and scope
>>> > 
>>> >
>>> > The NumPy financial functions [1]_ are the 10 functions ``fv``,
>>> > ``ipmt``,
>>> > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and
>>> > ``rate``.
>>> > The functions provide elementary financial calculations such as
>>> > future value,
>>> > net present value, etc. These functions were added to NumPy in 2008
>>> > [2]_.
>>> >
>>> > In May, 2009, a request by Joe Harrington to add a function called
>>> > ``xirr`` to
>>> > the financial functions triggered a long thread about these functions
>>> > [3]_.
>>> > One important point that came up in that thread is that a "real"
>>> > financial
>>> > library must be able to handle real dates.  The NumPy financial
>>> > functions do
>>> > not work with actual dates or calendars.  The preference for a more
>>> > capable
>>> > library independent of NumPy was expressed several times in that
>>> > thread.
>>> >
>>> > In June, 2009, D. L. Goldsmith expressed concerns about the
>>> > correctness of the
>>> > implementations of some of the financial functions [4]_.  It was
>>> > suggested then
>>> > to move the financial functions out of NumPy to an independent
>>> > package.
>>> >
>>> > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the
>>> > financial
>>> > functions from the top-level namespace to ``numpy.financial``.  He
>>> > also
>>> > suggested giving the functions better names.  Responses at that time
>>> > included
>>> > the suggestion to deprecate them and move them from NumPy to a
>>> > separate
>>> > package.  This issue is still open.
>>> >
>>> > Later in 2013 [6]_, it was suggested on the mailing list that these
>>> > functions
>>> > be removed from NumPy.
>>> >
>>> > The arguments for the removal of these functions from NumPy:
>>> >
>>> > * They are too specialized for NumPy.
>>> > * They are not actually useful for "real world" financial
>>> > calculations, because
>>> >   they do not handle real dates and calendars.
>>> > * The definition of "correctness" for some of these functions seems

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Nathaniel Smith
On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers  wrote:
>
>
>
> On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith  wrote:
>>
>> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers  wrote:
>> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith  wrote:
>> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi  
>> >> wrote:
>> >> > The fact that we're having to design more and more protocols for a lot
>> >> > of very similar things is, to me, an indicator that we do have holistic
>> >> > problems that ought to be solved by a single protocol.
>> >>
>> >> But the reason we've had trouble designing these protocols is that
>> >> they're each different :-). If it was just a matter of copying
>> >> __array_ufunc__ we'd have been done in a few minutes...
>> >
>> > I don't think that argument is correct. That we now have two very similar 
>> > protocols is simply a matter of history and limited developer time. NEP 18 
>> > discusses in several places that __array_ufunc__ should be brought in line 
>> > with __array_ufunc__, and that we can migrate a function from one protocol 
>> > to the other. There's no technical reason other than backwards compat and 
>> > dev time why we couldn't use __array_function__ for ufuncs also.
>>
>> Huh, that's interesting! Apparently we have a profoundly different
>> understanding of what we're doing here.
>
>
> That is interesting indeed. We should figure this out first - no point 
> discussing a NEP about plugging the gaps in our override system when we don't 
> have a common understanding of why we wanted/needed an override system in the 
> first place.
>
>> To me, __array_ufunc__ and
>> __array_function__ are completely different. In fact I'd say
>> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
>
>
> It's early days, but "customer feedback" certainly has been more enthusiastic 
> for __array_function__. Also from what I've seen so far it works well. 
> Example: at the SciPy sprints someone put together Xarray plus pydata/sparse 
> to use distributed sparse arrays for visualizing some large genetic (I think) 
> data sets. That was made to work in a single day, with impressively little 
> code.

Yeah, it's true, and __array_function__ made a bunch of stuff that
used to be impossible become possible, I'm not saying it didn't. My
prediction is that the longer we live with it, the more limits we'll
hit and the more problems we'll have with long-term maintainability. I
don't think initial enthusiasm is a good predictor of that either way.

>> The key difference is that __array_ufunc__ allows for *generic*
>> implementations.
>
> Implementations of what?

Generic in the sense that you can write __array_ufunc__ once and have
it work for all ufuncs.

>> Most duck array libraries can write a single
>> implementation of __array_ufunc__ that works for *all* ufuncs, even
>> new third-party ufuncs that the duck array library has never heard of,
>
>
> I see where you're going with this. You are thinking of reusing the ufunc 
> implementation to do a computation. That's a minor use case (imho), and I 
> can't remember seeing it used.

I mean, I just looked at dask and xarray, and they're both doing
exactly what I said, right now in shipping code. What use cases are
you targeting here if you consider dask and xarray out-of-scope? :-)

> this is case where knowing if something is a ufunc helps use a property of 
> it. so there the more specialized nature of __array_ufunc__ helps. Seems 
> niche though, and could probably also be done by checking if a function is an 
> instance of np.ufunc via __array_function__

Sparse arrays aren't very niche... and the isinstance trick is
possible in some cases, but (a) it's relying on an undocumented
implementation detail of __array_function__; according to
__array_function__'s API contract, you could just as easily get passed
the ufunc's __call__ method instead of the object itself, and (b) it
doesn't work at all for ufunc methods like reduce, outer, accumulate.
These are both show-stoppers IMO.

> This last point, using third-party ufuncs, is the interesting one to me. They 
> have to be generated with the NumPy ufunc machinery, so the dispatch 
> mechanism is attached to them. We could do third party functions for 
> __array_function__ too, but that would require making 
> @array_function_dispatch public, which we haven't done (yet?).

With __array_function__ it's theoretically possible to do the dispatch
on third-party functions, but when someone defines a new function they
always have to go update all the duck array libraries to hard-code in
some special knowledge of their new function. So in my example, even
if we made @array_function_dispatch public, you still couldn't use
your nice new numba-created gufunc unless you first convinced dask,
xarray, and bcolz to all accept patches to support your new gufunc.
With __array_ufunc__, it works out-of-the-box.

> But what is that road, and what do you think the goal is? To me it's: 
> separate our API from our i

Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

2019-09-08 Thread Nathan
On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith  wrote:

> On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers 
> wrote:
> >
> >
> >
> > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith  wrote:
> >>
> >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers 
> wrote:
> >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith 
> wrote:
> >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <
> einstein.edi...@gmail.com> wrote:
> >> >> > The fact that we're having to design more and more protocols for a
> lot
> >> >> > of very similar things is, to me, an indicator that we do have
> holistic
> >> >> > problems that ought to be solved by a single protocol.
> >> >>
> >> >> But the reason we've had trouble designing these protocols is that
> >> >> they're each different :-). If it was just a matter of copying
> >> >> __array_ufunc__ we'd have been done in a few minutes...
> >> >
> >> > I don't think that argument is correct. That we now have two very
> similar protocols is simply a matter of history and limited developer time.
> NEP 18 discusses in several places that __array_ufunc__ should be brought
> in line with __array_ufunc__, and that we can migrate a function from one
> protocol to the other. There's no technical reason other than backwards
> compat and dev time why we couldn't use __array_function__ for ufuncs also.
> >>
> >> Huh, that's interesting! Apparently we have a profoundly different
> >> understanding of what we're doing here.
> >
> >
> > That is interesting indeed. We should figure this out first - no point
> discussing a NEP about plugging the gaps in our override system when we
> don't have a common understanding of why we wanted/needed an override
> system in the first place.
> >
> >> To me, __array_ufunc__ and
> >> __array_function__ are completely different. In fact I'd say
> >> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
> >
> >
> > It's early days, but "customer feedback" certainly has been more
> enthusiastic for __array_function__. Also from what I've seen so far it
> works well. Example: at the SciPy sprints someone put together Xarray plus
> pydata/sparse to use distributed sparse arrays for visualizing some large
> genetic (I think) data sets. That was made to work in a single day, with
> impressively little code.
>
> Yeah, it's true, and __array_function__ made a bunch of stuff that
> used to be impossible become possible, I'm not saying it didn't. My
> prediction is that the longer we live with it, the more limits we'll
> hit and the more problems we'll have with long-term maintainability. I
> don't think initial enthusiasm is a good predictor of that either way.
>
> >> The key difference is that __array_ufunc__ allows for *generic*
> >> implementations.
> >
> > Implementations of what?
>
> Generic in the sense that you can write __array_ufunc__ once and have
> it work for all ufuncs.
>
> >> Most duck array libraries can write a single
> >> implementation of __array_ufunc__ that works for *all* ufuncs, even
> >> new third-party ufuncs that the duck array library has never heard of,
> >
> >
> > I see where you're going with this. You are thinking of reusing the
> ufunc implementation to do a computation. That's a minor use case (imho),
> and I can't remember seeing it used.
>
> I mean, I just looked at dask and xarray, and they're both doing
> exactly what I said, right now in shipping code. What use cases are
> you targeting here if you consider dask and xarray out-of-scope? :-)
>
> > this is case where knowing if something is a ufunc helps use a property
> of it. so there the more specialized nature of __array_ufunc__ helps. Seems
> niche though, and could probably also be done by checking if a function is
> an instance of np.ufunc via __array_function__
>
> Sparse arrays aren't very niche... and the isinstance trick is
> possible in some cases, but (a) it's relying on an undocumented
> implementation detail of __array_function__; according to
> __array_function__'s API contract, you could just as easily get passed
> the ufunc's __call__ method instead of the object itself, and (b) it
> doesn't work at all for ufunc methods like reduce, outer, accumulate.
> These are both show-stoppers IMO.
>
> > This last point, using third-party ufuncs, is the interesting one to me.
> They have to be generated with the NumPy ufunc machinery, so the dispatch
> mechanism is attached to them. We could do third party functions for
> __array_function__ too, but that would require making
> @array_function_dispatch public, which we haven't done (yet?).
>
> With __array_function__ it's theoretically possible to do the dispatch
> on third-party functions, but when someone defines a new function they
> always have to go update all the duck array libraries to hard-code in
> some special knowledge of their new function. So in my example, even
> if we made @array_function_dispatch public, you still couldn't use
> your nice new numba-created gufunc unless you first convinced dask,
> xarray, and bcolz to al