On Mon, May 29, 2017 at 1:51 PM, Charles R Harris <charlesr.har...@gmail.com> wrote: > > > On Mon, May 29, 2017 at 12:32 PM, Marten van Kerkwijk > <m.h.vankerkw...@gmail.com> wrote: >> >> Hi Chuck, >> >> Like Sebastian, I wonder a little about what level you are talking >> about. Presumably, it is the actual implementation of the ufunc? I.e., >> this is not about the upper logic that decides which `__array_ufunc__` >> to call, etc. >> >> If so, I agree with you that it would seem to make most sense to move >> the implementation to `multiarray`; the current structure certainly is >> a major hurdle to understanding how things work! >> >> Indeed, I guess in terms of my earlier suggestion to make much of a >> ufunc happen in `ndarray.__array_ufunc__`, one could seem the type >> resolution and iteration happening there. If one were to expose the >> inner loops, anyone working with buffers could then use the ufuncs by >> defining their own __array_ufunc__. > > > The idea of separating ufuncs from ndarray was put forward many years ago, > maybe five or six. What I seek here is a record that we have given up on > that ambition, so do not need to take it into consideration in the future. > In particular, we can feel free to couple ufuncs even more tightly with > ndarray.
I think we do want to separate ufuncs from ndarray semantically: it should be possible to use ufuncs on sparse arrays, dask arrays, etc. etc. But I don't think that altering ufuncs to work directly on buffer/memoryview objects, or shipping them as a separate package from the rest of numpy, is a useful step towards this goal. Right now, handling buffers/memoryviews is easy: one can trivially convert between them and ndarray without making any copies. I don't know of any interesting problems that are blocked because ufuncs work on ndarrays instead of buffer/memoryview objects. The interesting problems are where there's a fundamentally different storage strategy involved, like sparse/dask/... arrays. And similarly, I don't see what problems are solved by splitting them out for building or distribution. OTOH, trying to accomplish either of these things definitely has a cost in terms of churn, complexity, double the workload for release-management, etc. Even the current split between the multiarray and umath modules causes problems all the time. It's mostly boring problems like having little utility functions that are needed in both places but awkward to share, or problems caused by the complicated machinery needed to let them interact properly (set_numeric_ops and all that) – this doesn't seem like stuff that's adding any value. Plus, there's a major problem that buffers/memoryviews don't have any way to represent all the dtypes we currently support (e.g. datetime64) and don't have any way to add new ones, and the only way to fix this would be to write a PEP, shepherding patches through python-dev, waiting for the next python major release and then dropping support for all older Python releases. None of this is going to happen soon; probably we should plan on the assumption that it will never happen. So I don't see how this could work at all. So my vote is for merging the multiarray and umath code bases together, and then taking advantage of the resulting flexibility to refactor the internals to provide cleanly separated interfaces at the API level. -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion