Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers wrote: > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi >> wrote: >> > The fact that we're having to design more and more protocols for a lot >> > of very similar things is, to me, an indicator that we do have holistic >> > problems that ought to be solved by a single protocol. >> >> But the reason we've had trouble designing these protocols is that >> they're each different :-). If it was just a matter of copying >> __array_ufunc__ we'd have been done in a few minutes... > > I don't think that argument is correct. That we now have two very similar > protocols is simply a matter of history and limited developer time. NEP 18 > discusses in several places that __array_ufunc__ should be brought in line > with __array_ufunc__, and that we can migrate a function from one protocol to > the other. There's no technical reason other than backwards compat and dev > time why we couldn't use __array_function__ for ufuncs also. Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here. To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea, and would definitely not be in favor of combining them together. The key difference is that __array_ufunc__ allows for *generic* implementations. Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of, because ufuncs all share the same structure of a loop wrapped around a core operation, and they can treat the core operation as a black box. For example: - Dask can split up the operation across its tiled sub-arrays, and then for each tile it invokes the core operation. - xarray can do its label-based axis matching, and then invoke the core operation. - bcolz can loop over the array uncompressing one block at a time, invoking the core operation on each. - sparse arrays can check the ufunc .identity attribute to find out whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.) Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1] OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs. To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day. That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty. So I don't have a problem with trying both as experiments and learning more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about. -n [1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs – they can't run CPU code, so they can't use the CPU implementation of the core operations. Another limitation is that __array_ufunc__ is weak at handling operations that involve mixed libraries (e.g. np.add(bcolz_array, sparse_array)) – to work well, this might require that bcolz have special-case handling for sparse arrays, or vice-versa, so you still potentially have some N**2 special cases, though at least here N is the numb
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On 08.09.19 09:53, Nathaniel Smith wrote: On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers wrote: On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi wrote: The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol. But the reason we've had trouble designing these protocols is that they're each different . If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes... I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also. Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here. To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea, and would definitely not be in favor of combining them together. The key difference is that __array_ufunc__ allows for *generic* implementations. Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of, because ufuncs all share the same structure of a loop wrapped around a core operation, and they can treat the core operation as a black box. For example: - Dask can split up the operation across its tiled sub-arrays, and then for each tile it invokes the core operation. - xarray can do its label-based axis matching, and then invoke the core operation. - bcolz can loop over the array uncompressing one block at a time, invoking the core operation on each. - sparse arrays can check the ufunc .identity attribute to find out whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.) Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1] OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs. But uarray does allow this kind of simplification. You would do the following inside a uarray backend: def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but This is possible today and is done in the dask backend inside unumpy for example. To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day. That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty. So I don't have a problem with trying both as experiments and learning more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about. -n [1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs – they can't run CPU code, so they can't use the CPU implementation of the core operations. Another limit
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi wrote: > > On 08.09.19 09:53, Nathaniel Smith wrote: >> OTOH, __array_function__ doesn't allow this kind of simplification: if >> we were using __array_function__ for ufuncs, every library would have >> to special-case every individual ufunc, which leads to dramatically >> more work and more potential for bugs. > > But uarray does allow this kind of simplification. You would do the following > inside a uarray backend: > > def __ua_function__(func, args, kwargs): > with ua.skip_backend(self_backend): > # Do code here, dispatches to everything but You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before. Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction. A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions. -n -n -- Nathaniel J. Smith -- https://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On 08.09.19 10:56, Nathaniel Smith wrote: On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi wrote: On 08.09.19 09:53, Nathaniel Smith wrote: OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs. But uarray does allow this kind of simplification. You would do the following inside a uarray backend: def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before. Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction. A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions. There's a solution for that too: Default implementations. Implement concatenate, and you've got a default implementation for all of those you mentioned. Similarly for transpose/swapaxis/moveaxis and family. -n -n ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: > On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: > > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: > >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi > wrote: > >> > The fact that we're having to design more and more protocols for a lot > >> > of very similar things is, to me, an indicator that we do have > holistic > >> > problems that ought to be solved by a single protocol. > >> > >> But the reason we've had trouble designing these protocols is that > >> they're each different :-). If it was just a matter of copying > >> __array_ufunc__ we'd have been done in a few minutes... > > > > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > > Huh, that's interesting! Apparently we have a profoundly different > understanding of what we're doing here. That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place. To me, __array_ufunc__ and > __array_function__ are completely different. In fact I'd say > __array_ufunc__ is a good idea and __array_function__ is a bad idea, > It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code. and would definitely not be in favor of combining them together. > I'm not saying we should. But __array_ufunc__ is basically a slight specialization - knowing that the function that was called is a ufunc can be handy but is usually irrelevant. > The key difference is that __array_ufunc__ allows for *generic* > implementations. Implementations of what? Most duck array libraries can write a single > implementation of __array_ufunc__ that works for *all* ufuncs, even > new third-party ufuncs that the duck array library has never heard of, > I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used. The original use case was scipy.sparse matrices. The executive summary of NEP 13 talks about this. It's about calling `np.some_ufunc(other_ndarray_like)` and "handing over control" to that object rather than the numpy function starting to execute. Also note that NEP 13 states in the summary "This covers some of the same ground as Travis Oliphant’s proposal to retro-fit NumPy with multi-methods" (reminds one of uarray). For scipy.sparse, the layout of the data doesn't make sense to numpy. All that was desired was that the sparse matrix needs to know what function was called, so it can call its own implementation of that function instead. because ufuncs all share the same structure of a loop wrapped around a > core operation, and they can treat the core operation as a black box. > For example: > > - Dask can split up the operation across its tiled sub-arrays, and > then for each tile it invokes the core operation. > Works for __array_function__ too. Note, *not* by explicitly reusing the numpy function. Dask anyway has its own functions that mirror the numpy API. Dask's __array_function__ just does the forwarding to its own functions. Also, a Dask array could be a collection of CuPy arrays, and CuPy implements __array_ufunc__. So explicitly reusing the NumPy ufunc implementation on whatever comes in would be, well, not so nice. - xarray can do its label-based axis matching, and then invoke the > core operation. > Could do this with __array_function__ too - bcolz can loop over the array uncompressing one block at a time, > invoking the core operation on each. > not sure about this one - sparse arrays can check the ufunc .identity attribute this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__ to find out > whether 0 is an identity, and if so invoke the operation directly on > the non-zero entries; otherwise, it can loop over the array and > densify it in blocks and invoke the core operation on each. (It would > be useful to have a bit more metadata on the ufunc, so e.g. > np.subtra
Re: [Numpy-discussion] NEP 32: Remove the financial functions from NumPy
On 9/4/19, Matthew Brett wrote: > Hi, > > Maybe worth asking over at the Pandas list? I bet there are more > Python / finance people over there. OK, I sent a message to the PyData mailing list. Warren > > Cheers, > > Matthew > > On Wed, Sep 4, 2019 at 7:11 PM Ilhan Polat wrote: >> >> +1 on removing them from NumPy. I think there are plenty of alternatives >> already so many that we might even consider deprecating them just like >> SciPy misc module by pointing to alternatives. >> >> On Tue, Sep 3, 2019 at 6:38 PM Sebastian Berg >> wrote: >>> >>> On Tue, 2019-09-03 at 08:56 -0400, Warren Weckesser wrote: >>> > Github issue 2880 ("Get financial functions out of main namespace", >>> >>> Very briefly, I am absolutely in favor of this. >>> >>> Keeping the functions in numpy seems more of a liability than help >>> anyone. And this push is more likely to help users by spurring >>> development on a good replacement, than a practically unmaintained >>> corner of NumPy that may seem like it solves a problem, but probably >>> does so very poorly. >>> >>> Moving them into a separate pip installable package seems like the best >>> way forward until a better replacement, to which we can point users, >>> comes up. >>> >>> - Sebastian >>> >>> >>> > https://github.com/numpy/numpy/issues/2880) has been open since 2013. >>> > In a recent community meeting, it was suggested that we create a NEP >>> > to propose the removal of the financial functions from NumPy. I have >>> > submitted "NEP 32: Remove the financial functions from NumPy" in a >>> > pull request at https://github.com/numpy/numpy/pull/14399. A copy of >>> > the latest version of the NEP is below. >>> > >>> > According to the NEP process document, "Once the PR is in place, the >>> > NEP should be announced on the mailing list for discussion (comments >>> > on the PR itself should be restricted to minor editorial and >>> > technical fixes)." This email is the announcement for NEP 32. >>> > >>> > The NEP includes a brief summary of the history of the financial >>> > functions, and has links to several relevant mailing list threads, >>> > dating back to when the functions were added to NumPy in 2008. I >>> > recommend reviewing those threads before commenting here. >>> > >>> > Warren >>> > >>> > - >>> > >>> > == >>> > NEP 32 — Remove the financial functions from NumPy >>> > == >>> > >>> > :Author: Warren Weckesser >>> > :Status: Draft >>> > :Type: Standards Track >>> > :Created: 2019-08-30 >>> > >>> > >>> > Abstract >>> > >>> > >>> > We propose deprecating and ultimately removing the financial >>> > functions [1]_ >>> > from NumPy. The functions will be moved to an independent >>> > repository, >>> > and provided to the community as a separate package with the name >>> > ``numpy_financial``. >>> > >>> > >>> > Motivation and scope >>> > >>> > >>> > The NumPy financial functions [1]_ are the 10 functions ``fv``, >>> > ``ipmt``, >>> > ``irr``, ``mirr``, ``nper``, ``npv``, ``pmt``, ``ppmt``, ``pv`` and >>> > ``rate``. >>> > The functions provide elementary financial calculations such as >>> > future value, >>> > net present value, etc. These functions were added to NumPy in 2008 >>> > [2]_. >>> > >>> > In May, 2009, a request by Joe Harrington to add a function called >>> > ``xirr`` to >>> > the financial functions triggered a long thread about these functions >>> > [3]_. >>> > One important point that came up in that thread is that a "real" >>> > financial >>> > library must be able to handle real dates. The NumPy financial >>> > functions do >>> > not work with actual dates or calendars. The preference for a more >>> > capable >>> > library independent of NumPy was expressed several times in that >>> > thread. >>> > >>> > In June, 2009, D. L. Goldsmith expressed concerns about the >>> > correctness of the >>> > implementations of some of the financial functions [4]_. It was >>> > suggested then >>> > to move the financial functions out of NumPy to an independent >>> > package. >>> > >>> > In a GitHub issue in 2013 [5]_, Nathaniel Smith suggested moving the >>> > financial >>> > functions from the top-level namespace to ``numpy.financial``. He >>> > also >>> > suggested giving the functions better names. Responses at that time >>> > included >>> > the suggestion to deprecate them and move them from NumPy to a >>> > separate >>> > package. This issue is still open. >>> > >>> > Later in 2013 [6]_, it was suggested on the mailing list that these >>> > functions >>> > be removed from NumPy. >>> > >>> > The arguments for the removal of these functions from NumPy: >>> > >>> > * They are too specialized for NumPy. >>> > * They are not actually useful for "real world" financial >>> > calculations, because >>> > they do not handle real dates and calendars. >>> > * The definition of "correctness" for some of these functions seems
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers wrote: > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: >> >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers wrote: >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith wrote: >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi >> >> wrote: >> >> > The fact that we're having to design more and more protocols for a lot >> >> > of very similar things is, to me, an indicator that we do have holistic >> >> > problems that ought to be solved by a single protocol. >> >> >> >> But the reason we've had trouble designing these protocols is that >> >> they're each different :-). If it was just a matter of copying >> >> __array_ufunc__ we'd have been done in a few minutes... >> > >> > I don't think that argument is correct. That we now have two very similar >> > protocols is simply a matter of history and limited developer time. NEP 18 >> > discusses in several places that __array_ufunc__ should be brought in line >> > with __array_ufunc__, and that we can migrate a function from one protocol >> > to the other. There's no technical reason other than backwards compat and >> > dev time why we couldn't use __array_function__ for ufuncs also. >> >> Huh, that's interesting! Apparently we have a profoundly different >> understanding of what we're doing here. > > > That is interesting indeed. We should figure this out first - no point > discussing a NEP about plugging the gaps in our override system when we don't > have a common understanding of why we wanted/needed an override system in the > first place. > >> To me, __array_ufunc__ and >> __array_function__ are completely different. In fact I'd say >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > It's early days, but "customer feedback" certainly has been more enthusiastic > for __array_function__. Also from what I've seen so far it works well. > Example: at the SciPy sprints someone put together Xarray plus pydata/sparse > to use distributed sparse arrays for visualizing some large genetic (I think) > data sets. That was made to work in a single day, with impressively little > code. Yeah, it's true, and __array_function__ made a bunch of stuff that used to be impossible become possible, I'm not saying it didn't. My prediction is that the longer we live with it, the more limits we'll hit and the more problems we'll have with long-term maintainability. I don't think initial enthusiasm is a good predictor of that either way. >> The key difference is that __array_ufunc__ allows for *generic* >> implementations. > > Implementations of what? Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs. >> Most duck array libraries can write a single >> implementation of __array_ufunc__ that works for *all* ufuncs, even >> new third-party ufuncs that the duck array library has never heard of, > > > I see where you're going with this. You are thinking of reusing the ufunc > implementation to do a computation. That's a minor use case (imho), and I > can't remember seeing it used. I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-) > this is case where knowing if something is a ufunc helps use a property of > it. so there the more specialized nature of __array_ufunc__ helps. Seems > niche though, and could probably also be done by checking if a function is an > instance of np.ufunc via __array_function__ Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself, and (b) it doesn't work at all for ufunc methods like reduce, outer, accumulate. These are both show-stoppers IMO. > This last point, using third-party ufuncs, is the interesting one to me. They > have to be generated with the NumPy ufunc machinery, so the dispatch > mechanism is attached to them. We could do third party functions for > __array_function__ too, but that would require making > @array_function_dispatch public, which we haven't done (yet?). With __array_function__ it's theoretically possible to do the dispatch on third-party functions, but when someone defines a new function they always have to go update all the duck array libraries to hard-code in some special knowledge of their new function. So in my example, even if we made @array_function_dispatch public, you still couldn't use your nice new numba-created gufunc unless you first convinced dask, xarray, and bcolz to all accept patches to support your new gufunc. With __array_ufunc__, it works out-of-the-box. > But what is that road, and what do you think the goal is? To me it's: > separate our API from our i
Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith wrote: > On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers > wrote: > > > > > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith wrote: > >> > >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers > wrote: > >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith > wrote: > >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi < > einstein.edi...@gmail.com> wrote: > >> >> > The fact that we're having to design more and more protocols for a > lot > >> >> > of very similar things is, to me, an indicator that we do have > holistic > >> >> > problems that ought to be solved by a single protocol. > >> >> > >> >> But the reason we've had trouble designing these protocols is that > >> >> they're each different :-). If it was just a matter of copying > >> >> __array_ufunc__ we'd have been done in a few minutes... > >> > > >> > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > >> > >> Huh, that's interesting! Apparently we have a profoundly different > >> understanding of what we're doing here. > > > > > > That is interesting indeed. We should figure this out first - no point > discussing a NEP about plugging the gaps in our override system when we > don't have a common understanding of why we wanted/needed an override > system in the first place. > > > >> To me, __array_ufunc__ and > >> __array_function__ are completely different. In fact I'd say > >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > > > > It's early days, but "customer feedback" certainly has been more > enthusiastic for __array_function__. Also from what I've seen so far it > works well. Example: at the SciPy sprints someone put together Xarray plus > pydata/sparse to use distributed sparse arrays for visualizing some large > genetic (I think) data sets. That was made to work in a single day, with > impressively little code. > > Yeah, it's true, and __array_function__ made a bunch of stuff that > used to be impossible become possible, I'm not saying it didn't. My > prediction is that the longer we live with it, the more limits we'll > hit and the more problems we'll have with long-term maintainability. I > don't think initial enthusiasm is a good predictor of that either way. > > >> The key difference is that __array_ufunc__ allows for *generic* > >> implementations. > > > > Implementations of what? > > Generic in the sense that you can write __array_ufunc__ once and have > it work for all ufuncs. > > >> Most duck array libraries can write a single > >> implementation of __array_ufunc__ that works for *all* ufuncs, even > >> new third-party ufuncs that the duck array library has never heard of, > > > > > > I see where you're going with this. You are thinking of reusing the > ufunc implementation to do a computation. That's a minor use case (imho), > and I can't remember seeing it used. > > I mean, I just looked at dask and xarray, and they're both doing > exactly what I said, right now in shipping code. What use cases are > you targeting here if you consider dask and xarray out-of-scope? :-) > > > this is case where knowing if something is a ufunc helps use a property > of it. so there the more specialized nature of __array_ufunc__ helps. Seems > niche though, and could probably also be done by checking if a function is > an instance of np.ufunc via __array_function__ > > Sparse arrays aren't very niche... and the isinstance trick is > possible in some cases, but (a) it's relying on an undocumented > implementation detail of __array_function__; according to > __array_function__'s API contract, you could just as easily get passed > the ufunc's __call__ method instead of the object itself, and (b) it > doesn't work at all for ufunc methods like reduce, outer, accumulate. > These are both show-stoppers IMO. > > > This last point, using third-party ufuncs, is the interesting one to me. > They have to be generated with the NumPy ufunc machinery, so the dispatch > mechanism is attached to them. We could do third party functions for > __array_function__ too, but that would require making > @array_function_dispatch public, which we haven't done (yet?). > > With __array_function__ it's theoretically possible to do the dispatch > on third-party functions, but when someone defines a new function they > always have to go update all the duck array libraries to hard-code in > some special knowledge of their new function. So in my example, even > if we made @array_function_dispatch public, you still couldn't use > your nice new numba-created gufunc unless you first convinced dask, > xarray, and bcolz to al