On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers <ralf.gomm...@gmail.com> wrote: > On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith <n...@pobox.com> wrote: >> The reason this is challenging is that there's a lot of code written >> in Cython/C/C++ that calls np.asarray, > > Cython code only perhaps? It would surprise me if there's a lot of C/C++ code > that explicitly calls into our Python rather than C API.
I think there's also code written as Python-wrappers-around-C-code where the Python layer handles the error-checking/coercion, and the C code trusts it to have done so. >> Now if I understand right, your proposal would be to make it so any >> code in any package could arbitrarily change the behavior of >> np.asarray for all inputs, e.g. I could just decide that >> np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray >> object. > > No, definitely not! It's all opt-in, by explicitly importing from > `numpy.overridable` or `unumpy`. No behavior of anything in the existing > numpy namespaces should be affected in any way. Ah, whoops, I definitely missed that :-). That does change things! So one of the major decision points for any duck-array API work, is whether to modify the numpy semantics "in place", so user code automatically gets access to the new semantics, or else to make a new namespace, that users have to switch over to manually. The major disadvantage of doing changes "in place" is, of course, that we have to do all this careful work to move incrementally and make sure that we don't break things. The major (potential) advantage is that we have a much better chance of moving the ecosystem with us. The major advantage of making a new namespace is that it's *much* easier to experiment, because there's no chance of breaking any projects that didn't opt in. The major disadvantage is that numpy is super strongly entrenched, and convincing every project to switch to something else is incredibly difficult and costly. (I just searched github for "import numpy" and got 17.7 million hits. That's a lot of imports to update!) Also, empirically, we've seen multiple projects try to do this (e.g. DyND), and so far they all failed. It sounds like unumpy is an interesting approach that hasn't been tried before – in particular, the promise that you can "just switch your imports" is a much easier transition than e.g. DyND offered. Of course, that promise is somewhat undermined by the reality that all these potential backend libraries *aren't* 100% compatible with numpy, and can't be... it might turn out that this ends up like asanyarray, where you can't really use it reliably because the thing that comes out will generally support *most* of the normal ndarray semantics, but you don't know which part. Is scipy planning to switch to using this everywhere, including in C code? If not, then how do you expect projects like matplotlib to switch, given that matplotlib likes to pass array objects into scipy functions? Are you planning to take the opportunity to clean up some of the obscure corners of the numpy API? But those are general questions about unumpy, and I'm guessing no-one knows all the answers yet... and these question actually aren't super relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main thing the NEP is proposes is simply to make "numpy.overridable" an alias for "unumpy". It's not clear to me what problem this alias is solving. If all downstream users have to update their imports anyway, then they can write "import unumpy as np" just as easily as they can write "import numpy.overridable as np". I guess the main reason this is a NEP is because the unumpy project is hoping to get an "official stamp of approval" from numpy? But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits? On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <einstein.edi...@gmail.com> wrote: > The fact that we're having to design more and more protocols for a lot > of very similar things is, to me, an indicator that we do have holistic > problems that ought to be solved by a single protocol. But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes... -n -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion