On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith <n...@pobox.com> wrote:
> On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > > > > > > > > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith <n...@pobox.com> wrote: > >> > >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <n...@pobox.com> > wrote: > >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi < > einstein.edi...@gmail.com> wrote: > >> >> > The fact that we're having to design more and more protocols for a > lot > >> >> > of very similar things is, to me, an indicator that we do have > holistic > >> >> > problems that ought to be solved by a single protocol. > >> >> > >> >> But the reason we've had trouble designing these protocols is that > >> >> they're each different :-). If it was just a matter of copying > >> >> __array_ufunc__ we'd have been done in a few minutes... > >> > > >> > I don't think that argument is correct. That we now have two very > similar protocols is simply a matter of history and limited developer time. > NEP 18 discusses in several places that __array_ufunc__ should be brought > in line with __array_ufunc__, and that we can migrate a function from one > protocol to the other. There's no technical reason other than backwards > compat and dev time why we couldn't use __array_function__ for ufuncs also. > >> > >> Huh, that's interesting! Apparently we have a profoundly different > >> understanding of what we're doing here. > > > > > > That is interesting indeed. We should figure this out first - no point > discussing a NEP about plugging the gaps in our override system when we > don't have a common understanding of why we wanted/needed an override > system in the first place. > > > >> To me, __array_ufunc__ and > >> __array_function__ are completely different. In fact I'd say > >> __array_ufunc__ is a good idea and __array_function__ is a bad idea, > > > > > > It's early days, but "customer feedback" certainly has been more > enthusiastic for __array_function__. Also from what I've seen so far it > works well. Example: at the SciPy sprints someone put together Xarray plus > pydata/sparse to use distributed sparse arrays for visualizing some large > genetic (I think) data sets. That was made to work in a single day, with > impressively little code. > > Yeah, it's true, and __array_function__ made a bunch of stuff that > used to be impossible become possible, I'm not saying it didn't. My > prediction is that the longer we live with it, the more limits we'll > hit and the more problems we'll have with long-term maintainability. I > don't think initial enthusiasm is a good predictor of that either way. > > >> The key difference is that __array_ufunc__ allows for *generic* > >> implementations. > > > > Implementations of what? > > Generic in the sense that you can write __array_ufunc__ once and have > it work for all ufuncs. > > >> Most duck array libraries can write a single > >> implementation of __array_ufunc__ that works for *all* ufuncs, even > >> new third-party ufuncs that the duck array library has never heard of, > > > > > > I see where you're going with this. You are thinking of reusing the > ufunc implementation to do a computation. That's a minor use case (imho), > and I can't remember seeing it used. > > I mean, I just looked at dask and xarray, and they're both doing > exactly what I said, right now in shipping code. What use cases are > you targeting here if you consider dask and xarray out-of-scope? :-) > > > this is case where knowing if something is a ufunc helps use a property > of it. so there the more specialized nature of __array_ufunc__ helps. Seems > niche though, and could probably also be done by checking if a function is > an instance of np.ufunc via __array_function__ > > Sparse arrays aren't very niche... and the isinstance trick is > possible in some cases, but (a) it's relying on an undocumented > implementation detail of __array_function__; according to > __array_function__'s API contract, you could just as easily get passed > the ufunc's __call__ method instead of the object itself, and (b) it > doesn't work at all for ufunc methods like reduce, outer, accumulate. > These are both show-stoppers IMO. > > > This last point, using third-party ufuncs, is the interesting one to me. > They have to be generated with the NumPy ufunc machinery, so the dispatch > mechanism is attached to them. We could do third party functions for > __array_function__ too, but that would require making > @array_function_dispatch public, which we haven't done (yet?). > > With __array_function__ it's theoretically possible to do the dispatch > on third-party functions, but when someone defines a new function they > always have to go update all the duck array libraries to hard-code in > some special knowledge of their new function. So in my example, even > if we made @array_function_dispatch public, you still couldn't use > your nice new numba-created gufunc unless you first convinced dask, > xarray, and bcolz to all accept patches to support your new gufunc. > With __array_ufunc__, it works out-of-the-box. > > > But what is that road, and what do you think the goal is? To me it's: > separate our API from our implementation. Yours seems to be "reuse our > implementations" for __array_ufunc__, but I can't see how that generalizes > beyond ufuncs. > > The road is to define *abstractions* for the operations we expose > through our API, so that duck array implementors can work against a > contract with well-defined preconditions and postconditions, so they > can write code the works reliably even when the surrounding > environment changes. That's the only way to keep things maintainable > AFAICT. If the API contract is just a vague handwave at the numpy API, > then no-one knows which details actually matter, it's impossible to > test, implementations will inevitably end up with subtle long-standing > bugs, and literally any change in numpy could potentially break duck > array users, we don't know. So my motivation is that I like testing, I > don't like bugs, and I like being able to maintain things with > confidence :-). The principles are much more general than ufuncs; > that's just a pertinent example. > > > I think this is an important point. GPUs are massively popular, and when > very likely just continue to grow in importance. So anything we do in this > space that says "well it works, just not for GPUs" is probably not going to > solve our most pressing problems. > > I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying > that when it comes to GPUs, there's an upper bound for how good you > can hope to do, and __array_ufunc__ achieves that upper bound. So does > __array_function__. So if we only care about GPUs, they're about > equally good. But if we also care about dask and xarray and compressed > storage and sparse storage and ... then __array_ufunc__ is strictly > superior in those cases. So replacing __array_ufunc__ with > __array_function__ would be a major backwards step. One case that hasn’t been brought up in this thread is unit-handling. For example, unyt’s array_ufunc implementation explicitly handles ufuncs and will bail if someone tries to use a ufunc that unyt doesn’t know about. I tried to implement a completely generic solution but ended up concluding I couldn’t do that without silently generating answers with incorrect units. I definitely agree with your analysis that this sort of implementation is error-prone, in fact we just had to do a bugfix release to fix clip suddenly not working now that it’s a ufunc in numpy 1.17. > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion