On Sat, Sep 7, 2019 at 2:18 PM sebastian <sebast...@sipsolutions.net> wrote:
> On 2019-09-07 15:33, Ralf Gommers wrote: > > On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg > > <sebast...@sipsolutions.net> wrote: > > > >> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > >>> > >>> > >> <snip> > >> > >>>> That's part of it. The concrete problems it's solving are > >>>> threefold: > >>>> Array creation functions can be overridden. > >>>> Array coercion is now covered. > >>>> "Default implementations" will allow you to re-write your NumPy > >>>> array more easily, when such efficient implementations exist in > >>>> terms of other NumPy functions. That will also help achieve > >> similar > >>>> semantics, but as I said, they're just "default"... > >>>> > >>> > >>> There may be another very concrete one (that's not yet in the > >> NEP): > >>> allowing other libraries that consume ndarrays to use overrides. > >> An > >>> example is numpy.fft: currently both mkl_fft and pyfftw > >> monkeypatch > >>> NumPy, something we don't like all that much (in particular for > >>> mkl_fft, because it's the default in Anaconda). > >> `__array_function__` > >>> isn't able to help here, because it will always choose NumPy's own > >>> implementation for ndarray input. With unumpy you can support > >>> multiple libraries that consume ndarrays. > >>> > >>> Another example is einsum: if you want to use opt_einsum for all > >>> inputs (including ndarrays), then you cannot use np.einsum. And > >> yet > >>> another is using bottleneck ( > >>> https://kwgoodman.github.io/bottleneck-doc/reference.html) for > >> nan- > >>> functions and partition. There's likely more of these. > >>> > >>> The point is: sometimes the array protocols are preferred (e.g. > >>> Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch > >> works > >>> better. It's also not necessarily an either or, they can be > >>> complementary. > >>> > >> > >> Let me try to move the discussion from the github issue here (this > >> may > >> not be the best place). (https://github.com/numpy/numpy/issues/14441 > >> which asked for easier creation functions together with > >> `__array_function__`). > >> > >> I think an important note mentioned here is how users interact with > >> unumpy, vs. __array_function__. The former is an explicit opt-in, > >> while > >> the latter is implicit choice based on an `array-like` abstract base > >> class and functional type based dispatching. > >> > >> To quote NEP 18 on this: "The downsides are that this would require > >> an > >> explicit opt-in from all existing code, e.g., import numpy.api as > >> np, > >> and in the long term would result in the maintenance of two separate > >> NumPy APIs. Also, many functions from numpy itself are already > >> overloaded (but inadequately), so confusion about high vs. low level > >> APIs in NumPy would still persist." > >> (I do think this is a point we should not just ignore, `uarray` is a > >> thin layer, but it has a big surface area) > >> > >> Now there are things where explicit opt-in is obvious. And the FFT > >> example is one of those, there is no way to implicitly choose > >> another > >> backend (except by just replacing it, i.e. monkeypatching) [1]. And > >> right now I think these are _very_ different. > >> > >> Now for the end-users choosing one array-like over another, seems > >> nicer > >> as an implicit mechanism (why should I not mix sparse, dask and > >> numpy > >> arrays!?). This is the promise `__array_function__` tries to make. > >> Unless convinced otherwise, my guess is that most library authors > >> would > >> strive for implicit support (i.e. sklearn, skimage, scipy). > >> > >> Circling back to creation and coercion. In a purely Object type > >> system, > >> these would be classmethods, I guess, but in NumPy and the libraries > >> above, we are lost. > >> > >> Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) > >> * Required end-user opt-in. > > > >> * Seems cleaner in many ways > >> * Requires a full copy of the API. > > > > bullet 1 and 3 are not required. if we decide to make it default, then > > there's no separate namespace > > It does require explicit opt-in to have any benefits to the user. > > > > >> Solution 2: Add some coercion "protocol" (NEP-30) and expose a way > >> to > >> create new arrays more conveniently. This would practically mean > >> adding > >> an `array_type=np.ndarray` argument. > >> * _Not_ used by end-users! End users should use dask.linspace! > >> * Adds "strange" API somewhere in numpy, and possible a new > >> "protocol" (additionally to coercion).[2] > >> > >> I still feel these solve different issues. The second one is > >> intended > >> to make array likes work implicitly in libraries (without end users > >> having to do anything). While the first seems to force the end user > >> to > >> opt in, sometimes unnecessarily: > >> > >> def my_library_func(array_like): > >> exp = np.exp(array_like) > >> idx = np.arange(len(exp)) > >> return idx, exp > >> > >> Would have all the information for implicit opt-in/Array-like > >> support, > >> but cannot do it right now. > > > > Can you explain this a bit more? `len(exp)` is a number, so > > `np.arange(number)` doesn't really have any information here. > > > > Right, but as a library author, I want a way a way to make it use the > same type as `array_like` in this particular function, that is the > point! The end-user already signaled they prefer say dask, due to the > array that was actually passed in. (but this is just repeating what is > below I think). > Okay, you meant conceptually:) > >> This is what I have been wondering, if > >> uarray/unumpy, can in some way help me make this work (even > >> _without_ > >> the end user opting in). > > > > good question. if that needs to work in the absence of the user doing > > anything, it should be something like > > > > with unumpy.determine_backend(exp): > > unumpy.arange(len(exp)) # or np.arange if we make unumpy default > > > > to get the equivalent to `np.arange_like(len(exp), array_type=exp)`. > > > > Note, that `determine_backend` thing doesn't exist today. > > > > Exactly, that is what I have been wondering about, there may be more > issues around that. > If it existed, we may be able to solve the implicit library usage by > making libraries use > unumpy (or similar). Although, at that point we half replace > `__array_function__` maybe. > I don't really think so. Libraries can/will still use __array_function__ for most functionality, and just add a `with determine_backend` for the places where __array_function__ doesn't work. > However, the main point is that without such a functionality, NEP 30 and > NEP 31 seem to solve slightly > different issues with respect to how they interact with the end-user > (opt in)? > Yes, I agree with that. Cheers, Ralf > > We may decide that we do not want to solve the library users issue of > wanting to support implicit > opt-in for array like inputs because it is a rabbit hole. But we may > need to discuss/argue a bit > more that it really is a deep enough rabbit hole that it is not worth > the trouble. > > >> The reason is that simply, right now I am very > >> clear on the need for this use case, but not sure about the need for > >> end user opt in, since end users can just use dask.arange(). > > > > I don't get the last part. The arange is inside a library function, so > > a user can't just go in and change things there. > > A "user" here means "end user". An end user writes a script, and they > can easily change > `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely > just use one within one > script and the other within another script, while both use the same > sklearn functions. > (Although using a backend switching may be nicer in some contexts) > > A library provider (library user of unumpy/numpy) of course cannot just > use dask conveniently, > unless they write their own `guess_numpy_like_module()` function first. > > > > Cheers, > > > > Ralf > > > >> Cheers, > >> > >> Sebastian > >> > >> [1] To be honest, I do think a lot of the "issues" around > >> monkeypatching exists just as much with backend choosing, the main > >> difference seems to me that a lot of that: > >> 1. monkeypatching was not done explicit > >> (import mkl_fft; mkl_fft.monkeypatch_numpy())? > >> 2. A backend system allows libaries to prefer one locally? > >> (which I think is a big advantage) > >> > >> [2] There are the options of adding `linspace_like` functions > >> somewhere > >> in a numpy submodule, or adding `linspace(..., > >> array_type=np.ndarray)`, > >> or simply inventing a new "protocl" (which is not really a > >> protocol?), > >> and make it `ndarray.__numpy_like_creation_functions__.arange()`. > >> > >>> Actually, after writing this I just realized something. With > >> 1.17.x > >>> we have: > >>> > >>> ``` > >>> In [1]: import dask.array as da > >> > >>> > >>> > >>> In [2]: d = da.from_array(np.linspace(0, 1)) > >> > >>> > >>> > >>> In [3]: np.fft.fft(d) > >> > >>> > >>> Out[3]: dask.array<fft, shape=(50,), dtype=complex128, > >>> chunksize=(50,)> > >>> ``` > >>> > >>> In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this > >> won't > >>> work. We have no bug report yet because 1.17.x hasn't landed in > >> conda > >>> defaults yet (perhaps this is a/the reason why?), but it will be a > >>> problem. > >>> > >>>> The import numpy.overridable part is meant to help garner > >> adoption, > >>>> and to prefer the unumpy module if it is available (which will > >>>> continue to be developed separately). That way it isn't so > >> tightly > >>>> coupled to the release cycle. One alternative Sebastian Berg > >>>> mentioned (and I am on board with) is just moving unumpy into > >> the > >>>> NumPy organisation. What we fear keeping it separate is that the > >>>> simple act of a pip install unumpy will keep people from using > >> it > >>>> or trying it out. > >>>> > >>> Note that this is not the most critical aspect. I pushed for > >>> vendoring as numpy.overridable because I want to not derail the > >>> comparison with NEP 30 et al. with a "should we add a dependency" > >>> discussion. The interesting part to decide on first is: do we need > >>> the unumpy override mechanism? Vendoring opt-in vs. making it > >> default > >>> vs. adding a dependency is of secondary interest right now. > >>> > >>> Cheers, > >>> Ralf > >>> > >>> > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion@python.org > >>> https://mail.python.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion