On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote: > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gomm...@gmail.com> > wrote: > > > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <sho...@gmail.com> > > wrote: > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg < > > > sebast...@sipsolutions.net> wrote: <snip> > > > > > > I don't think NumPy needs to do anything about warnings. It is > > > straightforward for libraries that want to use use > > > get_array_module() to issue their own warnings before calling > > > get_array_module(), if desired. > > > > > > Or alternatively, if a library is about to add a new > > > __array_module__ method, it is straightforward to issue a warning > > > inside the new __array_module__ method before returning the NumPy > > > functions. > > > > > > > I don't think this is quite enough. Sebastian points out a fairly > > important issue. One of the main rationales for the whole NEP, and > > the argument in multiple places ( > > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users > > ) is that it's now opt-in while __array_function__ was opt-out. > > This isn't really true - the problem is simply *moved*, from the > > duck array libraries to the array-consuming libraries. The end user > > will still see the backwards incompatible change, with no way to > > turn it off. It will be easier with __array_module__ to warn users, > > but this should be expanded on in the NEP. > > > > Ralf, thanks for sharing your thoughts. > > I'm not quite I understand the concerns about backwards > incompatibility: > 1. The intention is that implementing a __array_module__ method > should be backwards compatible with all current uses of NumPy. This > satisfies backwards compatibility concerns for an array-implementing > library like JAX. > 2. In contrast, calling get_array_module() offers no guarantees about > backwards compatibility. This seems nearly impossible, because the > entire point of the protocol is to make it possible to opt-in to new > behavior. So backwards compatibility isn't solved for Scikit-Learn > switching to use get_array_module(), and after Scikit-Learn does so, > adding __array_module__ to new types of arrays could potentially have > backwards incompatible consequences for Scikit-Learn (unless sklearn > uses default=None). > > Are you suggesting just adding something like what I'm writing here > into the NEP? Perhaps along with advice to consider issuing warnings > inside __array_module__ and falling back to legacy behavior when > first implementing it on a new type?
I think that should be sufficient, personally. We could mention that scikit-learn will likely use a context manager to do this. We can also think about providing a global default (which sklearn can use as its own default if they wish so, but that is reserved to the end-user). That would be a small amendment, and I think we could add it even after accepting the NEP as it is. > > We could also potentially make a few changes to make backwards > compatibility even easier, by making the protocol less aggressive > about assuming that NumPy is a safe fallback. Some non-exclusive > options: > a. We could switch the default value of "default" on > get_array_module() to None, so an exception is raised if nothing > implements __array_module__. I am not sure that I feel switching the default to None makes much of a difference to be honest. Unless we use it to signal a super strict mode similar to b. below. > b. We could includes *all* argument types in "types", not just types > that implement __array_module__. NumPy's ndarray.__array_module__ > could then recognize and refuse to return an implementation if there > are other arguments that might implement __array_module__ in the > future (e.g., anything outside the standard library?). That is a good point, anything that is not NumPy recognized could simply be rejected. It does mean that you have to call `module.asarray()` manually more often though. For `list`, it could also make sense to just add np.ndarray to types. If we want to be conservative, maybe we could also just error before calling `__array_module__`. Whenever there is something that we do not know how to interpret force the user to clarify? > > The downside of making either of these choices is that it would > potentially make get_array_function() a bit less usable, because it > is more likely to fail, e.g., if called on a float, or some custom > type that should be treated as a scalar. Right, although we could relax it later if it seems overly annoying. > > > Also, I'm still not sure I agree with the tone of the discussion on > > this topic. It's very heavily inspired by what the JAX devs are > > telling you (the NEP still says PyTorch and scipy.sparse as well, > > but that's not true in both cases). If you ask Dask and CuPy for > > example, they're quite happy with __array_function__ and there > > haven't been many complaints about backwards compat breakage. > > > > I'm linking to comments you wrote in reference to PyTorch and > scipy.sparse in the current draft of the NEP, so I certainly want to > make sure that you agree my characterization :). > > Would it be fair to say: > - JAX is reluctant to implement __array_function__ because of > concerns about breaking existing code. JAX developers think that when > users use NumPy functions on JAX arrays, they are explicitly choosing > to convert from JAX to NumPy. This model is fundamentally > incompatible __array_function__, which we chose to override the > existing numpy namespace. > - PyTorch and scipy.sparse are not yet in position to implement > __array_function__ (due to a lack of a direct implementation of > NumPy's API), but these projects take backwards compatibility > seriously. > > Does "take backwards compatibility seriously" sound about right to > you? I'm very open to specific suggestions here. (TensorFlow could > probably also be safely added to this second list.) This will need input from Ralf, my personal main concern is backward compatibility in libraries: I am pretty sure sklearn would only use a potential `np.asduckarray` when the user opted in. But in that case my personal feeling is that the `get_array_module` solution is cleaner and makes it easier to expand functionality slowly (for libraries). Two other points: First, I am wondering if we should add something like a `__qualname__` to the contract. I.e. a returned module must have a well defined `module.__name__` (that is usually already correct), so that sklearn could do: module = np.get_array_module(*arrays) if module.__name__ not in ("numpy", "sparse"): raise TypeError("Currently only numpy and sparse are supported") if they wish so (that is trivial, but if you return a class acting as a module it may be important). Second, we have to make progress on whether or not the "restricted" namespace idea should have priority. My personal opinion is tending strongly towards no. The NumPy version should normally be older than other libraries, and if NumPy updates the API so do the downstream implementers. E.g. dask may have to provide multiple version of the same function depending on the installed NumPy version, but that seems OK to me? It is just as downstream libraries currently have to support multiple NumPy versions. We could add a contract that the first time `get_array_module` is used to e.g. get the dask namespace and the NumPy version is too new, a warning should be given. The practical thing seems to me that we ignore this for the moment (as something we can do later on)? If there is missing API, in most cases an AttributeError will be raised which could provide some additional information to the user? The only alternative seems the complete opposite?: Create a new module, and make even NumPy only one of the implementers of that new (restricted) module. That may be cleaner, but I fear that it is impractical to be honest. I will put this on the agenda for tomorrow, even if we discuss it only very briefly. My feeling (and hope) is that we are nearing a point where we can make a final decision. Best, Sebastian > > Best, > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion