On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote: > > > > > <snip> > > > > That's part of it. The concrete problems it's solving are > > > threefold: > > > Array creation functions can be overridden. > > > Array coercion is now covered. > > > "Default implementations" will allow you to re-write your NumPy > > > array more easily, when such efficient implementations exist in > > > terms of other NumPy functions. That will also help achieve similar > > > semantics, but as I said, they're just "default"... > > > > > > > There may be another very concrete one (that's not yet in the NEP): > > allowing other libraries that consume ndarrays to use overrides. An > > example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch > > NumPy, something we don't like all that much (in particular for > > mkl_fft, because it's the default in Anaconda). `__array_function__` > > isn't able to help here, because it will always choose NumPy's own > > implementation for ndarray input. With unumpy you can support > > multiple libraries that consume ndarrays. > > > > Another example is einsum: if you want to use opt_einsum for all > > inputs (including ndarrays), then you cannot use np.einsum. And yet > > another is using bottleneck ( > > https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan- > > functions and partition. There's likely more of these. > > > > The point is: sometimes the array protocols are preferred (e.g. > > Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works > > better. It's also not necessarily an either or, they can be > > complementary. > > > > Let me try to move the discussion from the github issue here (this may > not be the best place). (https://github.com/numpy/numpy/issues/14441 > which asked for easier creation functions together with > `__array_function__`). > > I think an important note mentioned here is how users interact with > unumpy, vs. __array_function__. The former is an explicit opt-in, while > the latter is implicit choice based on an `array-like` abstract base > class and functional type based dispatching. > > To quote NEP 18 on this: "The downsides are that this would require an > explicit opt-in from all existing code, e.g., import numpy.api as np, > and in the long term would result in the maintenance of two separate > NumPy APIs. Also, many functions from numpy itself are already > overloaded (but inadequately), so confusion about high vs. low level > APIs in NumPy would still persist." > (I do think this is a point we should not just ignore, `uarray` is a > thin layer, but it has a big surface area) > > Now there are things where explicit opt-in is obvious. And the FFT > example is one of those, there is no way to implicitly choose another > backend (except by just replacing it, i.e. monkeypatching) [1]. And > right now I think these are _very_ different. > > > Now for the end-users choosing one array-like over another, seems nicer > as an implicit mechanism (why should I not mix sparse, dask and numpy > arrays!?). This is the promise `__array_function__` tries to make. > Unless convinced otherwise, my guess is that most library authors would > strive for implicit support (i.e. sklearn, skimage, scipy). > > Circling back to creation and coercion. In a purely Object type system, > these would be classmethods, I guess, but in NumPy and the libraries > above, we are lost. > > Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) > * Required end-user opt-in. > * Seems cleaner in many ways > * Requires a full copy of the API. > bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace > Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to > create new arrays more conveniently. This would practically mean adding > an `array_type=np.ndarray` argument. > * _Not_ used by end-users! End users should use dask.linspace! > * Adds "strange" API somewhere in numpy, and possible a new > "protocol" (additionally to coercion).[2] > > I still feel these solve different issues. The second one is intended > to make array likes work implicitly in libraries (without end users > having to do anything). While the first seems to force the end user to > opt in, sometimes unnecessarily: > > def my_library_func(array_like): > exp = np.exp(array_like) > idx = np.arange(len(exp)) > return idx, exp > > Would have all the information for implicit opt-in/Array-like support, > but cannot do it right now. Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here. > This is what I have been wondering, if > uarray/unumpy, can in some way help me make this work (even _without_ > the end user opting in). good question. if that needs to work in the absence of the user doing anything, it should be something like with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default to get the equivalent to `np.arange_like(len(exp), array_type=exp)`. Note, that `determine_backend` thing doesn't exist today. The reason is that simply, right now I am very > clear on the need for this use case, but not sure about the need for > end user opt in, since end users can just use dask.arange(). > I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there. Cheers, Ralf > > Cheers, > > Sebastian > > > [1] To be honest, I do think a lot of the "issues" around > monkeypatching exists just as much with backend choosing, the main > difference seems to me that a lot of that: > 1. monkeypatching was not done explicit > (import mkl_fft; mkl_fft.monkeypatch_numpy())? > 2. A backend system allows libaries to prefer one locally? > (which I think is a big advantage) > > [2] There are the options of adding `linspace_like` functions somewhere > in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, > or simply inventing a new "protocl" (which is not really a protocol?), > and make it `ndarray.__numpy_like_creation_functions__.arange()`. > > > > > Actually, after writing this I just realized something. With 1.17.x > > we have: > > > > ``` > > In [1]: import dask.array as da > > > > > > In [2]: d = da.from_array(np.linspace(0, 1)) > > > > > > In [3]: np.fft.fft(d) > > > > Out[3]: dask.array<fft, shape=(50,), dtype=complex128, > > chunksize=(50,)> > > ``` > > > > In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't > > work. We have no bug report yet because 1.17.x hasn't landed in conda > > defaults yet (perhaps this is a/the reason why?), but it will be a > > problem. > > > > > The import numpy.overridable part is meant to help garner adoption, > > > and to prefer the unumpy module if it is available (which will > > > continue to be developed separately). That way it isn't so tightly > > > coupled to the release cycle. One alternative Sebastian Berg > > > mentioned (and I am on board with) is just moving unumpy into the > > > NumPy organisation. What we fear keeping it separate is that the > > > simple act of a pip install unumpy will keep people from using it > > > or trying it out. > > > > > Note that this is not the most critical aspect. I pushed for > > vendoring as numpy.overridable because I want to not derail the > > comparison with NEP 30 et al. with a "should we add a dependency" > > discussion. The interesting part to decide on first is: do we need > > the unumpy override mechanism? Vendoring opt-in vs. making it default > > vs. adding a dependency is of secondary interest right now. > > > > Cheers, > > Ralf > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion