On Mon, Apr 27, 2020 at 12:10 AM Sebastian Berg <sebast...@sipsolutions.net> wrote:
> On Sat, 2020-04-25 at 10:52 -0700, Stephan Hoyer wrote: > > On Sat, Apr 25, 2020 at 10:40 AM Ralf Gommers <ralf.gomm...@gmail.com > > > > > wrote: > > > > > > > > On Fri, Apr 24, 2020 at 12:35 PM Eric Wieser < > > > wieser.eric+nu...@gmail.com> > > > wrote: > > > > > > > Perhaps worth mentioning that we've discussed this sort of API > > > > before, in > > > > https://github.com/numpy/numpy/pull/11897. > > > > > > > > Under that proposal, the api would be something like: > > > > > > > > * `copy=True` - always copy, like it is today > > > > * `copy=False` - copy if needed, like it is today > > > > * `copy=np.never_copy` - never copy, throw an exception if not > > > > possible > > > > > > > > > > There's a couple of issues I see with using `copy` for __array__: > > > - copy is already weird (False doesn't mean no), and a [bool, > > > some_obj_or_str] keyword isn't making that better > > > - the behavior we're talking about can do more than copying, e.g. > > > for > > > PyTorch it would modify the autograd graph by adding detach(), and > > > for > > > sparse it's not just "make a copy" (which implies doubling memory > > > use) but > > > it densifies which can massively blow up the memory. > > > - I'm -1 on adding things to the main namespace (never_copy) for > > > something > > > that can be handled differently (like a string, or a new keyword) > > > > > > tl;dr a new `force` keyword would be better > > > > > > > I agree, “copy” is not a good description of this desired coercion > > behavior. > > > > A new keyword argument like “force” would be much clearer. > > > > That seems fine and practical. But, in the end it seems to me that the > `force=` keyword, just means that some projects want to teach their > users that: > > 1. `np.asarray()` can be expensive (and may always copy) > 2. `np.asarray()` always loses type properties > > while others do not choose to teach about it. There seems very little > or even no "promise" attached to either `force=True` or `force=False`. > > > In the end, the question is whether sparse will actually want to > implement `force=True` if the main reason we add is for library use. > That's for PyData Sparse and scipy.sparse devs to answer. Maybe Hameer can answer for the former here. For SciPy that should be decided on the scipy-dev list, but my opinion would be: yes to adding __array__ that raises TypeError by default, and converts with `force=True`. > There is no difference between a visualization library or numpy. In > both cases the users memory will blow up just the same. > > As for PyTorch, is `.detach()` even a good reason? Maybe I am missing > things, but: > > >>> torch.ones(10, requires_grad=True) + np.arange(10) > # RuntimeError: Can't call numpy() on Variable that requires grad. Use > var.detach().numpy() instead. > > So arguably, there is no type-safety concern due to `.detach()`. I'm not sure what the question is here; no one mentioned type-safety. The PyTorch maintainers have already said they're fine with adding a force keyword. There > is an (obvious) general loss of type information that always occurs > with an `np.asarray` call. > But I do not see that creating any openings for bugs here, due to the > wisdom of not allowing the above operation. > In fact, it actually seems much worse for for xarray, or pandas. They > do support the above operation and will potentially mess up if the > arange was previously an xarray with a matching index, but different > order. > > > I am very much in favor of adding such things, but I still lack a bit > of clarity as to whom we would be helping? > See Juan's first email. I personally am ambivalent on this proposal, but if Juan and the Napari devs really want it, that's good enough for me. Cheers, Ralf > If end-users will actually use `np.asarray(..., force=True)` over > special methods, then great! But I am currently not sure the type- > safety argument is all that big of a point. And the performance or > memory-blowup argument remains true even for visualization libraries > (where the array is purely input and never output as such). > > > But yes, "never copy" is a somewhat different extension to `__array__` > and `np.asarray`. It guarantees high speed and in-place behaviour which > is useful for different settings. > > - Sebastian > > > > > > > Cheers, > > > Ralf > > > > > > > > > > I think the discussion stalled on the precise spelling of the > > > > third > > > > option. > > > > > > > > `__array__` was not discussed there, but it seems like adding the > > > > `copy` > > > > argument to `__array__` would be a perfectly reasonable > > > > extension. > > > > > > > > Eric > > > > > > > > On Fri, 24 Apr 2020 at 03:00, Juan Nunez-Iglesias < > > > > j...@fastmail.com> > > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > One bit of expressivity we would miss is “copy if necessary, > > > > > but > > > > > > otherwise don’t bother”, but there are workarounds to this. > > > > > > > > > > > > > > > > After a side discussion with Stéfan van der Walt, we came up > > > > > with > > > > > `allow_copy=True`, which would express to the downstream > > > > > library that we > > > > > don’t mind waiting, but that zero-copy would also be ok. > > > > > > > > > > This sounds like the sort of thing that is use case driven. If > > > > > enough > > > > > projects want to use it, then I have no objections to adding > > > > > the keyword. > > > > > OTOH, we need to be careful about adding too many > > > > > interoperability tricks > > > > > as they complicate the code and makes it hard for folks to > > > > > determine the > > > > > best solution. Interoperability is a hot topic and we need to > > > > > be careful > > > > > not put too leave behind too many experiments in the NumPy > > > > > code. Do you > > > > > have any other ideas of how to achieve the same effect? > > > > > > > > > > > > > > > Personally, I don’t have any other ideas, but would be happy to > > > > > hear > > > > > some! > > > > > > > > > > My view regarding API/experiment creep is that `__array__` is > > > > > the oldest > > > > > and most basic of all the interop tricks and that this can be > > > > > safely > > > > > maintained for future generations. Currently it only takes > > > > > `dtype=` as a > > > > > keyword argument, so it is a very lean API. I think this > > > > > particular use > > > > > case is very natural and I’ve encountered the reluctance to > > > > > implicitly copy > > > > > twice, so I expect it is reasonably common. > > > > > > > > > > Regarding difficulty in determining the best solution, I would > > > > > be happy > > > > > to contribute to the dispatch basics guide together with the > > > > > new kwarg. I > > > > > agree that the protocols are getting quite numerous and I > > > > > couldn’t find a > > > > > single place that gathers all the best practices together. But, > > > > > to > > > > > reiterate my point: `__array__` is the simplest of these and I > > > > > think this > > > > > keyword is pretty safe to add. > > > > > > > > > > For ease of discussion, here are the API options discussed so > > > > > far, as > > > > > well as a few extra that I don’t like but might trigger other > > > > > ideas: > > > > > > > > > > np.asarray(my_duck_array, allow_copy=True) # default is False, > > > > > or None > > > > > -> leave it to the duck array to decide > > > > > np.asarray(my_duck_array, copy=True) # always copies, but, if > > > > > supported > > > > > by the duck array, defers to it for the copy > > > > > np.asarray(my_duck_array, copy=‘allow’) # could take values > > > > > ‘allow’, > > > > > ‘force’, ’no’, True(=‘force’), False(=’no’) > > > > > np.asarray(my_duck_array, force_copy=False, allow_copy=True) # > > > > > separate > > > > > concepts, but unclear what force_copy=True, allow_copy=False > > > > > means! > > > > > np.asarray(my_duck_array, force=True) > > > > > > > > > > Juan. > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion@python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion@python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion