On Thu, 2020-08-20 at 16:00 -0600, Aaron Meurer wrote: > Just to be clear, what exactly do you think should be deprecated? > Boolean scalar indices in general, or just boolean scalars combined > with other arrays, or something else?
My angle is that we should allow only: * Any number of integer array indices (ideally only explicitly with `arr.vindex[]`, but we do not have that luxury right now.) * A single boolean index (array or scalar is identical) but no mix of the above (including multiple boolean indices). Because I think they are at least one level more confusing than multiple advanced indices. I admit, I forgot that the broadcasting logic is fine in this case: arr = np.zeros((2, 3)) arr[[True], np.array(3)] where the advanced index is also a scalar index. In that case the result is straight forward, since broadcasting does not affect `np.array(3)`. I am happy to be wrong about that assessment, but I think your opinion on it could likely push us towards just doing a Deprecation. The only use case for "multiple boolean indices" that I could think of was this: arr = np.diag([1, 2, 3, 4]) # 2-d square array indx = arr.diagonal() > 2 # mask for each row and column masked_diagonal = arr[indx, indx] print(repr(masked_diagonal)) # array([3, 4]) and my guess is that the reaction to that code is a: "Wait what?!" That code might seem reasonable, but it only works if you have the exact same number of `True` values in the two indices. And if you have the exact same number but two different arrays, then I fail to reason about the result without doing the `nonzero` step, which I think indicates that there just is no logical concept for it. So, I think we may be better of forcing the few power-user who may have found a use for this type of nugget to use `np.nonzero()` or find another solution. - Sebastian > > Aaron Meurer > > On Thu, Aug 20, 2020 at 3:56 PM Sebastian Berg > <sebast...@sipsolutions.net> wrote: > > On Thu, 2020-08-20 at 16:50 -0500, Sebastian Berg wrote: > > > On Thu, 2020-08-20 at 12:21 -0600, Aaron Meurer wrote: > > > > You're right. I was confusing the broadcasting logic for > > > > boolean > > > > arrays. > > > > > > > > However, I did find this example > > > > > > > > > > > np.arange(10).reshape((2, 5))[np.array([[0, 0, 0, 0, 0]], > > > > > > > dtype=np.int64), False] > > > > Traceback (most recent call last): > > > > File "<stdin>", line 1, in <module> > > > > IndexError: shape mismatch: indexing arrays could not be > > > > broadcast > > > > together with shapes (1,5) (0,) > > > > > > > > That certainly seems to imply there is some broadcasting being > > > > done. > > > > > > Yes, it broadcasts the array after converting it with `nonzero`, > > > i.e. > > > its much the same as: > > > > > > indices = [[0, 0, 0, 0, 0]], *np.nonzero(False) > > > indices = np.broadcast_arrays(*indices) > > > > > > will give the same result (see also `np.ix_` which converts > > > booleans > > > as > > > well for this reason, to give you outer indexing). > > > I was half way through a mock-up/pseudo code, but thought you > > > likely > > > wasn't sure it was ending up clear. It sounds like things are > > > probably > > > falling into place for you (if they are not, let me know what > > > might > > > help you): > > > > Sorry editing error up there, in short I hope those steps sense to > > you, > > note that the broadcasting is basically part of a later "integer > > only" > > indexing step, and the `nonzero` part is pre-processing. > > > > > 1. Convert all boolean indices into a series of integer indices > > > using > > > `np.nonzero(index)` > > > > > > 2. For True/False scalars, that doesn't work, because > > > `np.nonzero()`. > > > > > > `nonzero` gave us an index array (which is good, we obviously > > > want > > > > > > one), but we need to index into `boolean_index.ndim == 0` > > > dimensions! > > > So that won't work, the approach using `nonzero` cannot > > > generalize > > > > > > here, although boolean indices generalize perfectly. > > > > > > The solution to the dilemma is simple: If we have to index one > > > dimension, but should be indexing zero, then we simply add > > > that > > > dimension to the original array (or at least pretend there was > > > an additional dimension). > > > > > > 3. Do normal indexing with the result *including broadcasting*, > > > we forget it was converted. > > > > > > The other way to solve it would be to always reshape the original > > > array > > > to combine all axes being indexed by a single boolean index into > > > one > > > axis and then index it using `np.flatnonzero`. (But that would > > > get a > > > different result if you try to broadcast!) > > > > > > > > > In any case, I am not sure I would bother with making sense of > > > this, > > > except for sports! > > > Its pretty much nonsense and I think the time understanding it is > > > probably better spend deprecating it. The only reason I did not > > > Deprecate itt before, is that I tried to do be minimal in the > > > changes > > > when I rewrote advanced indexing (and generalized boolean scalars > > > correctly) long ago. That was likely the right start/choice at > > > the > > > time, since there were much bigger fish to catch, but I do not > > > think > > > anything is holding us back now. > > > > > > Cheers, > > > > > > Sebastian > > > > > > > > > > Aaron Meurer > > > > > > > > On Wed, Aug 19, 2020 at 6:55 PM Sebastian Berg > > > > <sebast...@sipsolutions.net> wrote: > > > > > On Wed, 2020-08-19 at 18:07 -0600, Aaron Meurer wrote: > > > > > > > > 3. If you have multiple advanced indexing you get > > > > > > > > annoying > > > > > > > > broadcasting > > > > > > > > of all of these. That is *always* confusing for > > > > > > > > boolean > > > > > > > > indices. > > > > > > > > 0-D should not be too special there... > > > > > > > > > > > > OK, now that I am learning more about advanced indexing, > > > > > > this > > > > > > statement is confusing to me. It seems that scalar boolean > > > > > > indices do > > > > > > not broadcast. For example: > > > > > > > > > > Well, broadcasting means you broadcast the *nonzero result* > > > > > unless > > > > > I am > > > > > very confused... There is a reason I dismissed it. We could > > > > > (and > > > > > arguably should) just deprecate it. And I have doubts anyone > > > > > would > > > > > even notice. > > > > > > > > > > > > > > np.arange(2)[False, np.array([True, False])] > > > > > > array([], dtype=int64) > > > > > > > > > np.arange(2)[tuple(np.broadcast_arrays(False, > > > > > > > > > np.array([True, > > > > > > > > > False])))] > > > > > > Traceback (most recent call last): > > > > > > File "<stdin>", line 1, in <module> > > > > > > IndexError: too many indices for array: array is 1- > > > > > > dimensional, > > > > > > but 2 > > > > > > were indexed > > > > > > > > > > > > And indeed, the docs even say, as you noted, "the nonzero > > > > > > equivalence > > > > > > for Boolean arrays does not hold for zero dimensional > > > > > > boolean > > > > > > arrays," > > > > > > which I guess also applies to the broadcasting. > > > > > > > > > > I actually think that probably also holds. Nonzero just > > > > > behave > > > > > weird > > > > > for 0D because arrays (because it returns a tuple). > > > > > But since broadcasting the nonzero result is so weird, and > > > > > since > > > > > 0- > > > > > D > > > > > booleans require some additional logic and don't generalize > > > > > 100% > > > > > (code > > > > > wise), I won't rule out there are differences. > > > > > > > > > > > From what I can tell, the logic is that all integer and > > > > > > boolean > > > > > > arrays > > > > > > > > > > Did you try that? Because as I said above, IIRC broadcasting > > > > > the > > > > > boolean array without first calling `nonzero` isn't really > > > > > whats > > > > > going > > > > > on. And I don't know how it could be whats going on, since > > > > > adding > > > > > dimensions to a boolean index would have much more > > > > > implications? > > > > > > > > > > - Sebastian > > > > > > > > > > > > > > > > (and scalar ints) are broadcast together, *except* for > > > > > > boolean > > > > > > scalars. Then the first boolean scalar is replaced with > > > > > > and(all > > > > > > boolean scalars) and the rest are removed from the index. > > > > > > Then > > > > > > that > > > > > > index adds a length 1 axis if it is True and 0 if it is > > > > > > False. > > > > > > > > > > > > So they don't broadcast, but rather "fake broadcast". I > > > > > > still > > > > > > contend > > > > > > that it would be much more useful, if True were a synonym > > > > > > for > > > > > > newaxis > > > > > > and False worked like newaxis but instead added a length 0 > > > > > > axis. > > > > > > Alternately, True and False scalars should behave exactly > > > > > > like > > > > > > all > > > > > > other boolean arrays with no exceptions (i.e., work like > > > > > > np.nonzero(), > > > > > > broadcast, etc.). This would be less useful, but more > > > > > > consistent. > > > > > > > > > > > > Aaron Meurer > > > > > > _______________________________________________ > > > > > > NumPy-Discussion mailing list > > > > > > NumPy-Discussion@python.org > > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > > _______________________________________________ > > > > > NumPy-Discussion mailing list > > > > > NumPy-Discussion@python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion@python.org > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion@python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion