On Thu, 2020-10-29 at 23:58 -0600, Aaron Meurer wrote: > On Thu, Oct 29, 2020 at 6:09 PM Sebastian Berg > <sebast...@sipsolutions.net> wrote: > > On Tue, 2020-10-27 at 17:15 -0600, Aaron Meurer wrote: > > > For ndindex (https://quansight.github.io/ndindex/), the biggest > > > issue > > > with the API is that to use an ndindex object to actually index > > > an > > > array, you have to use a[idx.raw] instead of a[idx]. This is > > > because > > > for NumPy arrays, you cannot allow custom objects to be indices. > > > The > > > exception is objects that define __index__, but this only works > > > for > > > integer indices. If __index__ returns anything other than an > > > integer, > > > you get an IndexError. This is annoying because it's easy to > > > forget > > > to > > > do this when working with the ndindex API, and the error message > > > from > > > NumPy isn't informative about what went wrong unless you know to > > > expect it. > > > > > > I'd like to propose an API that would allow custom objects to > > > define > > > how they should be converted to a standard NumPy index, similar > > > to > > > __index__ but that supports all index types. I think there are > > > two > > > options here: > > > > > > - Allow __index__ to return any index type, not just integers. > > > This > > > is > > > the simplest because it reuses an existing API, and __index__ is > > > the > > > best possible name for this API. However, I'm not sure, but this > > > may > > > actually conflict with the text of PEP 357 > > > (https://www.python.org/dev/peps/pep-0357/). Also, some other > > > APIs > > > use > > > __index__ to check if something is an indexable integer, which > > > wouldn't accept generic index. For example, elements of a slice > > > can > > > be > > > any object that defines __index__. > > > > > > > Index converts to an integer (safely). There is an assumptions > > that > > the integer is good for indexing, but I the name shouldn't be taken > > to > > mean it is specific to indexing (even if that was the main > > motivation). > > > > > > > - Add a new __numpy_index__ API that works like > > > > > > def __numpy_index__(self): > > > return <tuple, integer, slice, newaxis, ellipsis, or integer > > > or > > > boolean array> > > > > > > In NumPy, __getitem__ and __setitem__ on ndarray would first > > > check if > > > the input index type is one of the known types as it currently > > > does, > > > then it would try __index__, and if neither of those fails, it > > > would > > > call __numpy_index__(index) and use that. > > > > Do you anticipate just: > > > > arr[index] > > > > or also: > > > > arr[index1, index2] > > I think both should work. If the second one doesn't work it would be > surprising. > > > Would you expect pandas or array-like objects to support this as > > well? > > Yes, it would probably be best for array-like to also work with the > same API. > > I don't know much about Pandas. It seems like it already allows a lot > of indexing stuff. Do Series/Dataframe already have such an API?
I do not think so, but indexing in pandas works differently often. So I was curious whether y > > > If we only do `arr[index]` might subclassing tuple be sufficient? > > I guess that technically works, except now your objects have to act > like a tuple, even if they represent something like a slice (Python > does not allow subclassing slice). For ndindex I've tried to make a > distinction between objects as representing indices and the built-in > objects that happen to be used to represent those indices by default. > So an ndindex.Tuple explicitly doesn't work like a Tuple, an > ndindex.Integer doesn't work like an int, and so on. That way there > is > a clear distinction between ndindex operations and operations on the > built-in types. > > > Do > > you have any thought on how this might play out with a potential > > `arr.oindex[...]`? > > I think oindex[idx] would call the same API on idx. I'm not sure if > it > matters that it's oindex, since that's at a higher level. It is at a higher level, but it seemed to me that `ndindex` largely plays at that level. For example, you have a method to implement index chaining: arr[idx1][idx2] == arr[idx1.as_subindex(idx2)] (or similar). But this will not work: arr.oindex[idx1].oindex[idx2] != arr.idx[idx1.as_subindex(idx2)] Also the "result" shape, or even questions like `.isempty()` will give different answers when used as an `.oindex[...]`. This is why I though that `arr[idx1, idx2]` is possibly very different case from `arr[idx]` at least for current NumPy indexing logic (it would be better with `arr.oindex[]`). The difference doesn't matter in your proposal, but I had the impression that the `arr[idx1, idx2]` form might be rare/unused and that form would not be able to carry information such as whether this is supposed to be an "oindex". Maybe it helps to look back at `.oindex` to explain this. A possible solution to subclass handling if we add `arr.oindex` is to make it so that: myarr.oindex[indx] could call: myarr.__getitem__(indx_object) Where `index_object` knows that this is was an oindex. The main reason is the expectation that many subclasses may implement `__getitem__`, but probably just do: def __getitem__(self, indx): new_data = self.data[indx] # Do something with new_data. Now for `ndindex` it would seem to make a lot of sense to have an OIndex object, etc. for the same reason. Of course how we implement `.oindex` can be pretty separate from this. > > > Adding either to NumPy is probably fairly straight forward, > > although I > > prefer either not slow down every single indexing operation for an > > extremely niche use-case (which is likely possible) or timing that > > it > > is insignificant. > > I'm not sure it would. The current cases would all be tried first. > The > only time the new protocol would be used is when the index type isn't > one of the currently allowed types, which currently raises > IndexError. > > > What might help me is understanding that `ndindex` itself better. > > Since > > it seems like asking to add a protocol that may very well be used > > by > > only this one project? > > That's fair. Maybe the more general API would make more sense then? I > think it would need more thinking out, but it would allow a lot more > use-cases. > A general API might make sense, but I am edgy about reversing the roles of who performs the indexing. For one thing that probably would break subclassing and overriding of `__getitem__`? Cheers, Sebastian > Aaron Meurer > > > > Note: there is a more general way that NumPy arrays could allow > > > __getitem__ to be defined on custom objects, which I am NOT > > > proposing. > > > Instead of an API that returns one of the current predefined > > > index > > > types (tuple, integer, slice, newaxis, ellipsis, or integer or > > > boolean > > > array), there could instead be an API that takes the array as > > > input > > > and returns another array (or view) as an output. This would > > > allow an > > > object to define itself as an index in arbitrary ways, even if > > > such > > > an > > > index would not actually be possible via traditional indexing. > > > There > > > are definitely some interesting ideas that could be done with > > > this, > > > but this idea would be much more complicated, and isn't something > > > that > > > I need. Unless the community feels that a more general API like > > > this > > > would be preferred, I would suggest deferring something like it > > > to a > > > later discussion. > > > > > > What would be the best way to go about getting something like > > > this > > > implemented? Is it simple enough that we can just work out the > > > details > > > here and on a pull request, or should I write a NEP? > > > > A short NEP may make sense, at least if this is supposed to be a > > generic protocol for general array-likes, which I guess it would > > have > > to be ready for. > > > > Cheers, > > > > Sebastian > > > > > > > Aaron Meurer > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion