On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi <einstein.edi...@gmail.com> wrote:
> On 10.09.19 05:32, Stephan Hoyer wrote: > > On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > >> I think we've chosen to try the former - dispatch on functions so we can >> reuse the NumPy API. It could work out well, it could give some long-term >> maintenance issues, time will tell. The question is now if and how to plug >> the gap that __array_function__ left. It's main limitation is "doesn't work >> for functions that don't have an array-like input" - that left out ~10-20% >> of functions. So now we have a proposal for a structural solution to that >> last 10-20%. It seems logical to want that gap plugged, rather than go back >> and say "we shouldn't have gone for the first 80%, so let's go no further". >> > > I'm excited about solving the remaining 10-20% of use cases for flexible > array dispatching, but the unumpy interface suggested here > (numpy.overridable) feels like a redundant redo of __array_function__ and > __array_ufunc__. > > I would much rather continue to develop specialized protocols for the > remaining usecases. Summarizing those I've seen in this thread, these > include: > 1. Overrides for customizing array creation and coercion. > 2. Overrides to implement operations for new dtypes. > 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs > with MKL. > > (1) could mostly be solved by adding np.duckarray() and another function > for duck array coercion. There is still the matter of overriding np.zeros > and the like, which perhaps justifies another new protocol, but in my > experience the use-cases for truly an array from scratch are quite rare. > > While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse > need these. > > > (2) should be tackled as part of overhauling NumPy's dtype system to > better support user defined dtypes. But it should definitely be in the form > of specialized protocols, e.g., which pass in preallocated arrays to into > ufuncs for a new dtype. By design, new dtypes should not be able to > customize the semantics of array *structure*. > > We already have a split in the type system with e.g. Cython's buffers, > Numba's parallel type system. This is a different issue altogether, e.g. > allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write > of unyt to cooperate with NumPy's new dtype system. > I guess you're proposing that operations like np.sum(numpy_array, dtype=other_dtype) could rely on other_dtype for the implementation and potentially return a non-NumPy array? I'm not sure this is well motivated -- it would be helpful to discuss actual use-cases. The most commonly used NumPy functionality related to dtypes can be found only in methods on np.ndarray, e.g., astype() and view(). But I don't think there's any proposal to change that. > 4. Having default implementations that allow overrides of a large part of > the API while defining only a small part. This holds for e.g. > transpose/concatenate. > I'm not sure how unumpy solve the problems we encountered when trying to do this with __array_function__ -- namely the way that it exposes all of NumPy's internals, or requires rewriting a lot of internal NumPy code to ensure it always casts inputs with asarray(). I think it would be useful to expose default implementations of NumPy operations somewhere to make it easier to implement __array_function__, but it doesn't make much sense to couple this to user facing overrides. These can be exposed as a separate package or numpy module (e.g., numpy.default_implementations) that uses np.duckarray(), which library authors can make use of by calling inside their __aray_function__ methods. > 5. Generation of Random numbers (overriding RandomState). CuPy has its > own implementation which would be nice to override. > I'm not sure that NumPy's random state objects make sense for duck arrays. Because these are stateful objects, they are pretty coupled to NumPy's implementation -- you cannot store any additional state on RandomState objects that might be needed for a new implementation. At a bare minimum, you will loss the reproducibility of random seeds, though this may be less of a concern with the new random API. > I also share Nathaniel's concern that the overrides in unumpy are too > powerful, by allowing for control from arbitrary function arguments and > even *non-local* control (i.e., global variables) from context managers. > This level of flexibility can make code very hard to debug, especially in > larger codebases. > > Backend switching needs global context, in any case. There isn't a good > way around that other than the class dundermethods outlined in another > thread, which would require rewrites of large amounts of code. > Do we really need to support robust backend switching in NumPy? I'm not strongly opposed, but what use cases does it actually solve to be able to override np.fft.fft rather than using a new function? At some point, if you want maximum performance you won't be writing the code using NumPy proper anyways. At best you'll be using a system with duck-array support like CuPy.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion