On Wed, Sep 11, 2019 at 4:18 PM Ralf Gommers <ralf.gomm...@gmail.com> wrote:
> > > On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer <sho...@gmail.com> wrote: > >> On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi <einstein.edi...@gmail.com> >> wrote: >> >>> On 10.09.19 05:32, Stephan Hoyer wrote: >>> >>> On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gomm...@gmail.com> >>> wrote: >>> >>>> I think we've chosen to try the former - dispatch on functions so we >>>> can reuse the NumPy API. It could work out well, it could give some >>>> long-term maintenance issues, time will tell. The question is now if and >>>> how to plug the gap that __array_function__ left. It's main limitation is >>>> "doesn't work for functions that don't have an array-like input" - that >>>> left out ~10-20% of functions. So now we have a proposal for a structural >>>> solution to that last 10-20%. It seems logical to want that gap plugged, >>>> rather than go back and say "we shouldn't have gone for the first 80%, so >>>> let's go no further". >>>> >>> >>> I'm excited about solving the remaining 10-20% of use cases for flexible >>> array dispatching, >>> >>> Great! I think most (but not all) of us are on the same page here. > Actually now that Peter came up with the `like=` keyword idea for array > creation functions I'm very interested in seeing that worked out, feels > like that could be a nice solution for part of that 10-20% that did look > pretty bad before. > >> but the unumpy interface suggested here (numpy.overridable) feels like a >>> redundant redo of __array_function__ and __array_ufunc__. >>> >>> > A bit of context: a big part of the reason I advocated for > numpy.overridable is that library authors can use it *only* for the parts > not already covered by the protocols we already have. If there's overlap > there's several ways to deal with that, including only including part of > the unumpy API surface. It does plug all the holes in one go (although you > can then indeed argue it does too much), and there is no other coherent > proposal/vision yet that does this. What you wrote below comes closest, and > I'd love to see that worked out (e.g. the like= argument for array > creation). What I don't like is an ad-hoc plugging of one hole at a time > without visibility on how many more protocols and new workaround functions > in the API we would need. So hopefully we can come to an apples-to-apples > comparison of two design alternatives. > > Also, we just discussed this whole thread in the community call, and it's > clear that it's a complex matter with many different angles. It's very hard > to get a full overview. Our conclusion in the call was that this will > benefit from an in-person discussion. The sprint in November may be a > really good opportunity for that. > Sounds good, I'm looking forward to the discussion at the November sprint! > In the meantime we can of course keep working out ideas/docs. For now I > think it's clear that we (the NEP authors) have some homework to do - that > may take some time. > > >>> I would much rather continue to develop specialized protocols for the >>> remaining usecases. Summarizing those I've seen in this thread, these >>> include: >>> 1. Overrides for customizing array creation and coercion. >>> 2. Overrides to implement operations for new dtypes. >>> 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs >>> with MKL. >>> >>> (1) could mostly be solved by adding np.duckarray() and another function >>> for duck array coercion. There is still the matter of overriding np.zeros >>> and the like, which perhaps justifies another new protocol, but in my >>> experience the use-cases for truly an array from scratch are quite rare. >>> >>> While they're rare for libraries like XArray; CuPy, Dask and >>> PyData/Sparse need these. >>> >>> >>> (2) should be tackled as part of overhauling NumPy's dtype system to >>> better support user defined dtypes. But it should definitely be in the form >>> of specialized protocols, e.g., which pass in preallocated arrays to into >>> ufuncs for a new dtype. By design, new dtypes should not be able to >>> customize the semantics of array *structure*. >>> >>> We already have a split in the type system with e.g. Cython's buffers, >>> Numba's parallel type system. This is a different issue altogether, e.g. >>> allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write >>> of unyt to cooperate with NumPy's new dtype system. >>> >> >> I guess you're proposing that operations like np.sum(numpy_array, >> dtype=other_dtype) could rely on other_dtype for the implementation and >> potentially return a non-NumPy array? I'm not sure this is well motivated >> -- it would be helpful to discuss actual use-cases. >> >> The most commonly used NumPy functionality related to dtypes can be found >> only in methods on np.ndarray, e.g., astype() and view(). But I don't think >> there's any proposal to change that. >> >>> 4. Having default implementations that allow overrides of a large part >>> of the API while defining only a small part. This holds for e.g. >>> transpose/concatenate. >>> >> I'm not sure how unumpy solve the problems we encountered when trying to >> do this with __array_function__ -- namely the way that it exposes all of >> NumPy's internals, or requires rewriting a lot of internal NumPy code to >> ensure it always casts inputs with asarray(). >> >> I think it would be useful to expose default implementations of NumPy >> operations somewhere to make it easier to implement __array_function__, but >> it doesn't make much sense to couple this to user facing overrides. These >> can be exposed as a separate package or numpy module (e.g., >> numpy.default_implementations) that uses np.duckarray(), which library >> authors can make use of by calling inside their __aray_function__ methods. >> >>> 5. Generation of Random numbers (overriding RandomState). CuPy has its >>> own implementation which would be nice to override. >>> >> I'm not sure that NumPy's random state objects make sense for duck >> arrays. Because these are stateful objects, they are pretty coupled to >> NumPy's implementation -- you cannot store any additional state on >> RandomState objects that might be needed for a new implementation. At a >> bare minimum, you will loss the reproducibility of random seeds, though >> this may be less of a concern with the new random API. >> >>> I also share Nathaniel's concern that the overrides in unumpy are too >>> powerful, by allowing for control from arbitrary function arguments and >>> even *non-local* control (i.e., global variables) from context managers. >>> This level of flexibility can make code very hard to debug, especially in >>> larger codebases. >>> >>> Backend switching needs global context, in any case. There isn't a good >>> way around that other than the class dundermethods outlined in another >>> thread, which would require rewrites of large amounts of code. >>> >> >> Do we really need to support robust backend switching in NumPy? I'm not >> strongly opposed, but what use cases does it actually solve to be able to >> override np.fft.fft rather than using a new function? >> > > I don't know, but that feels like an odd question. We wanted an FFT > backend system. Now applying __array_function__ to numpy.fft happened > without a real discussion, but as a backend system I don't think it would > have met the criteria. Something that works for CuPy, Dask and Xarray, but > not for Pyfftw or mkl_fft is only half a solution. > I agree, __array_function__ is not a backend system. > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion