On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote: > > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gomm...@gmail.com> > > wrote: > > > > > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <sho...@gmail.com> > > > wrote: > > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg < > > > > sebast...@sipsolutions.net> wrote: > <snip> > > > > > > > > I don't think NumPy needs to do anything about warnings. It is > > > > straightforward for libraries that want to use use > > > > get_array_module() to issue their own warnings before calling > > > > get_array_module(), if desired. > > > > > > > > Or alternatively, if a library is about to add a new > > > > __array_module__ method, it is straightforward to issue a warning > > > > inside the new __array_module__ method before returning the NumPy > > > > functions. > > > > > > > > > > I don't think this is quite enough. Sebastian points out a fairly > > > important issue. One of the main rationales for the whole NEP, and > > > the argument in multiple places ( > > > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > > ) is that it's now opt-in while __array_function__ was opt-out. > > > This isn't really true - the problem is simply *moved*, from the > > > duck array libraries to the array-consuming libraries. The end user > > > will still see the backwards incompatible change, with no way to > > > turn it off. It will be easier with __array_module__ to warn users, > > > but this should be expanded on in the NEP. > > > > > > > Ralf, thanks for sharing your thoughts. > Sorry, this never made it back to the top of my todo list. > > > I'm not quite I understand the concerns about backwards > > incompatibility: > > 1. The intention is that implementing a __array_module__ method > > should be backwards compatible with all current uses of NumPy. This > > satisfies backwards compatibility concerns for an array-implementing > > library like JAX. > > 2. In contrast, calling get_array_module() offers no guarantees about > > backwards compatibility. This seems nearly impossible, because the > > entire point of the protocol is to make it possible to opt-in to new > > behavior. Indeed, it is nearly impossible. Except if there's a context manager or some other control mechanism exposed to the end user. Hence that should be a part of the design I think. Otherwise you're just solving something for the JAX devs, but not for the scikit-learn/scipy/etc devs who will then each have to invent their own wheel for backwards compat. So backwards compatibility isn't solved for Scikit-Learn > > switching to use get_array_module(), and after Scikit-Learn does so, > > adding __array_module__ to new types of arrays could potentially have > > backwards incompatible consequences for Scikit-Learn (unless sklearn > > uses default=None). > > > > Are you suggesting just adding something like what I'm writing here > > into the NEP? Perhaps along with advice to consider issuing warnings > > inside __array_module__ and falling back to legacy behavior when > > first implementing it on a new type? > > I think that should be sufficient, personally. We could mention that > scikit-learn will likely use a context manager to do this. > We can also think about providing a global default (which sklearn can > use as its own default if they wish so, but that is reserved to the > end-user). > +1 That would be a small amendment, and I think we could add it even after > accepting the NEP as it is. > > > > > We could also potentially make a few changes to make backwards > > compatibility even easier, by making the protocol less aggressive > > about assuming that NumPy is a safe fallback. Some non-exclusive > > options: > > a. We could switch the default value of "default" on > > get_array_module() to None, so an exception is raised if nothing > > implements __array_module__. > > I am not sure that I feel switching the default to None makes much of a > difference to be honest. Unless we use it to signal a super strict mode > similar to b. below. > I agree, that doesn't make a difference. > > b. We could includes *all* argument types in "types", not just types > > that implement __array_module__. NumPy's ndarray.__array_module__ > > could then recognize and refuse to return an implementation if there > > are other arguments that might implement __array_module__ in the > > future (e.g., anything outside the standard library?). > > That is a good point, anything that is not NumPy recognized could > simply be rejected. It does mean that you have to call > `module.asarray()` manually more often though. > For `list`, it could also make sense to just add np.ndarray to types. > > If we want to be conservative, maybe we could also just error before > calling `__array_module__`. Whenever there is something that we do not > know how to interpret force the user to clarify? > > > > > The downside of making either of these choices is that it would > > potentially make get_array_function() a bit less usable, because it > > is more likely to fail, e.g., if called on a float, or some custom > > type that should be treated as a scalar. > > Right, although we could relax it later if it seems overly annoying. > Interesting point. Not accepting sequences could be considered here. It may help a lot with robustness and typing to only accept ndarray, other objects with __array__, and scalars. > > > > > Also, I'm still not sure I agree with the tone of the discussion on > > > this topic. It's very heavily inspired by what the JAX devs are > > > telling you (the NEP still says PyTorch and scipy.sparse as well, > > > but that's not true in both cases). If you ask Dask and CuPy for > > > example, they're quite happy with __array_function__ and there > > > haven't been many complaints about backwards compat breakage. > > > > > > > I'm linking to comments you wrote in reference to PyTorch and > > scipy.sparse in the current draft of the NEP, so I certainly want to > > make sure that you agree my characterization :). > > > > Would it be fair to say: > > - JAX is reluctant to implement __array_function__ because of > > concerns about breaking existing code. JAX developers think that when > > users use NumPy functions on JAX arrays, they are explicitly choosing > > to convert from JAX to NumPy. This model is fundamentally > > incompatible __array_function__, which we chose to override the > > existing numpy namespace. > agreed > - PyTorch and scipy.sparse are not yet in position to implement > > __array_function__ (due to a lack of a direct implementation of > > NumPy's API), but these projects take backwards compatibility > > seriously. > True. I would say though that scipy.sparse will never implement either __array_function__ or array_module__ due to semantic imcompatibilities (it acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets around to adding a numpy-compatible API, they're fine with __array_function__. > > > Does "take backwards compatibility seriously" sound about right to > > you? I'm very open to specific suggestions here. (TensorFlow could > > probably also be safely added to this second list.) > > This will need input from Ralf, my personal main concern is backward > compatibility in libraries: I am pretty sure sklearn would only use a > potential `np.asduckarray` when the user opted in. But in that case my > personal feeling is that the `get_array_module` solution is cleaner and > makes it easier to expand functionality slowly (for libraries). > > Two other points: > > First, I am wondering if we should add something like a `__qualname__` > to the contract. I.e. a returned module must have a well defined > `module.__name__` (that is usually already correct), so that sklearn > could do: > > module = np.get_array_module(*arrays) > if module.__name__ not in ("numpy", "sparse"): > raise TypeError("Currently only numpy and sparse are supported") > > if they wish so (that is trivial, but if you return a class acting as a > module it may be important). > > Second, we have to make progress on whether or not the "restricted" > namespace idea should have priority. My personal opinion is tending > strongly towards no. > I think it's quite important, and __array_module__ gives a chance to introduce it. However, it's not ready - so I'd say that if __array_module__ implementation is ready and there's no well-defined restricted API proposal (I expect to have that in August), then we can move ahead without it. The NumPy version should normally be older than other libraries, and if > NumPy updates the API so do the downstream implementers. > E.g. dask may have to provide multiple version of the same function > depending on the installed NumPy version, but that seems OK to me? That seems unworkable, and I don't think any libraries do this. Coupling the semantics of a single Dask function to the installed numpy version is odd. It is just as downstream libraries currently have to support multiple > NumPy versions. > We could add a contract that the first time `get_array_module` is used > to e.g. get the dask namespace and the NumPy version is too new, a > warning should be given. > I think we can't solve this until we have a well-defined API, which is the restricted API + API versioning. Until then it just remains with the current status, compatibility is implementation-defined. Cheers, Ralf > The practical thing seems to me that we ignore this for the moment (as > something we can do later on)? If there is missing API, in most cases > an AttributeError will be raised which could provide some additional > information to the user? > The only alternative seems the complete opposite?: Create a new module, > and make even NumPy only one of the implementers of that new > (restricted) module. That may be cleaner, but I fear that it is > impractical to be honest. > > > I will put this on the agenda for tomorrow, even if we discuss it only > very briefly. My feeling (and hope) is that we are nearing a point > where we can make a final decision. > > Best, > > Sebastian > > > > > > Best, > > Stephan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion