Re: [Numpy-discussion] What is up with raw boolean indices (like a[False])?

Sebastian Berg Thu, 20 Aug 2020 15:38:21 -0700

On Thu, 2020-08-20 at 16:00 -0600, Aaron Meurer wrote:
> Just to be clear, what exactly do you think should be deprecated?
> Boolean scalar indices in general, or just boolean scalars combined
> with other arrays, or something else?


My angle is that we should allow only:

* Any number of integer array indices (ideally only explicitly
  with `arr.vindex[]`, but we do not have that luxury right now.)

* A single boolean index (array or scalar is identical)

but no mix of the above (including multiple boolean indices).

Because I think they are at least one level more confusing than
multiple advanced indices.

I admit, I forgot that the broadcasting logic is fine in this case:

   arr = np.zeros((2, 3))
   arr[[True], np.array(3)]

where the advanced index is also a scalar index. In that case the
result is straight forward, since broadcasting does not affect
`np.array(3)`.


I am happy to be wrong about that assessment, but I think your opinion
on it could likely push us towards just doing a Deprecation.
The only use case for "multiple boolean indices" that I could think of
was this:

    arr = np.diag([1, 2, 3, 4])  # 2-d square array
    indx = arr.diagonal() > 2  # mask for each row and column
    masked_diagonal = arr[indx, indx]
    print(repr(masked_diagonal))
    # array([3, 4])

and my guess is that the reaction to that code is a: "Wait what?!"

That code might seem reasonable, but it only works if you have the
exact same number of `True` values in the two indices.
And if you have the exact same number but two different arrays, then I
fail to reason about the result without doing the `nonzero` step, which
I think indicates that there just is no logical concept for it.


So, I think we may be better of forcing the few power-user who may have
found a use for this type of nugget to use `np.nonzero()` or find
another solution.

- Sebastian


> 
> Aaron Meurer
> 
> On Thu, Aug 20, 2020 at 3:56 PM Sebastian Berg
> <sebast...@sipsolutions.net> wrote:
> > On Thu, 2020-08-20 at 16:50 -0500, Sebastian Berg wrote:
> > > On Thu, 2020-08-20 at 12:21 -0600, Aaron Meurer wrote:
> > > > You're right. I was confusing the broadcasting logic for
> > > > boolean
> > > > arrays.
> > > > 
> > > > However, I did find this example
> > > > 
> > > > > > > np.arange(10).reshape((2, 5))[np.array([[0, 0, 0, 0, 0]],
> > > > > > > dtype=np.int64), False]
> > > > Traceback (most recent call last):
> > > >   File "<stdin>", line 1, in <module>
> > > > IndexError: shape mismatch: indexing arrays could not be
> > > > broadcast
> > > > together with shapes (1,5) (0,)
> > > > 
> > > > That certainly seems to imply there is some broadcasting being
> > > > done.
> > > 
> > > Yes, it broadcasts the array after converting it with `nonzero`,
> > > i.e.
> > > its much the same as:
> > > 
> > >    indices = [[0, 0, 0, 0, 0]], *np.nonzero(False)
> > >    indices = np.broadcast_arrays(*indices)
> > > 
> > > will give the same result (see also `np.ix_` which converts
> > > booleans
> > > as
> > > well for this reason, to give you outer indexing).
> > > I was half way through a mock-up/pseudo code, but thought you
> > > likely
> > > wasn't sure it was ending up clear. It sounds like things are
> > > probably
> > > falling into place for you (if they are not, let me know what
> > > might
> > > help you):
> > 
> > Sorry editing error up there, in short I hope those steps sense to
> > you,
> > note that the broadcasting is basically part of a later "integer
> > only"
> > indexing step, and the `nonzero` part is pre-processing.
> > 
> > > 1. Convert all boolean indices into a series of integer indices
> > > using
> > >    `np.nonzero(index)`
> > > 
> > > 2. For True/False scalars, that doesn't work, because
> > > `np.nonzero()`.
> > > 
> > >  `nonzero` gave us an index array (which is good, we obviously
> > > want
> > > 
> > > one), but we need to index into `boolean_index.ndim == 0`
> > >    dimensions!
> > >    So that won't work, the approach using `nonzero` cannot
> > > generalize
> > > 
> > >  here, although boolean indices generalize perfectly.
> > > 
> > >    The solution to the dilemma is simple: If we have to index one
> > >    dimension, but should be indexing zero, then we simply add
> > > that
> > >    dimension to the original array (or at least pretend there was
> > >    an additional dimension).
> > > 
> > > 3. Do normal indexing with the result *including broadcasting*,
> > >    we forget it was converted.
> > > 
> > > The other way to solve it would be to always reshape the original
> > > array
> > > to combine all axes being indexed by a single boolean index into
> > > one
> > > axis and then index it using `np.flatnonzero`.  (But that would
> > > get a
> > > different result if you try to broadcast!)
> > > 
> > > 
> > > In any case, I am not sure I would bother with making sense of
> > > this,
> > > except for sports!
> > > Its pretty much nonsense and I think the time understanding it is
> > > probably better spend deprecating it.  The only reason I did not
> > > Deprecate itt before, is that I tried to do be minimal in the
> > > changes
> > > when I rewrote advanced indexing (and generalized boolean scalars
> > > correctly) long ago.  That was likely the right start/choice at
> > > the
> > > time, since there were much bigger fish to catch, but I do not
> > > think
> > > anything is holding us back now.
> > > 
> > > Cheers,
> > > 
> > > Sebastian
> > > 
> > > 
> > > > Aaron Meurer
> > > > 
> > > > On Wed, Aug 19, 2020 at 6:55 PM Sebastian Berg
> > > > <sebast...@sipsolutions.net> wrote:
> > > > > On Wed, 2020-08-19 at 18:07 -0600, Aaron Meurer wrote:
> > > > > > > > 3. If you have multiple advanced indexing you get
> > > > > > > > annoying
> > > > > > > > broadcasting
> > > > > > > >    of all of these. That is *always* confusing for
> > > > > > > > boolean
> > > > > > > > indices.
> > > > > > > >    0-D should not be too special there...
> > > > > > 
> > > > > > OK, now that I am learning more about advanced indexing,
> > > > > > this
> > > > > > statement is confusing to me. It seems that scalar boolean
> > > > > > indices do
> > > > > > not broadcast. For example:
> > > > > 
> > > > > Well, broadcasting means you broadcast the *nonzero result*
> > > > > unless
> > > > > I am
> > > > > very confused... There is a reason I dismissed it. We could
> > > > > (and
> > > > > arguably should) just deprecate it.  And I have doubts anyone
> > > > > would
> > > > > even notice.
> > > > > 
> > > > > > > > > np.arange(2)[False, np.array([True, False])]
> > > > > > array([], dtype=int64)
> > > > > > > > > np.arange(2)[tuple(np.broadcast_arrays(False,
> > > > > > > > > np.array([True,
> > > > > > > > > False])))]
> > > > > > Traceback (most recent call last):
> > > > > >   File "<stdin>", line 1, in <module>
> > > > > > IndexError: too many indices for array: array is 1-
> > > > > > dimensional,
> > > > > > but 2
> > > > > > were indexed
> > > > > > 
> > > > > > And indeed, the docs even say, as you noted, "the nonzero
> > > > > > equivalence
> > > > > > for Boolean arrays does not hold for zero dimensional
> > > > > > boolean
> > > > > > arrays,"
> > > > > > which I guess also applies to the broadcasting.
> > > > > 
> > > > > I actually think that probably also holds. Nonzero just
> > > > > behave
> > > > > weird
> > > > > for 0D because arrays (because it returns a tuple).
> > > > > But since broadcasting the nonzero result is so weird, and
> > > > > since
> > > > > 0-
> > > > > D
> > > > > booleans require some additional logic and don't generalize
> > > > > 100%
> > > > > (code
> > > > > wise), I won't rule out there are differences.
> > > > > 
> > > > > > From what I can tell, the logic is that all integer and
> > > > > > boolean
> > > > > > arrays
> > > > > 
> > > > > Did you try that? Because as I said above, IIRC broadcasting
> > > > > the
> > > > > boolean array without first calling `nonzero` isn't really
> > > > > whats
> > > > > going
> > > > > on. And I don't know how it could be whats going on, since
> > > > > adding
> > > > > dimensions to a boolean index would have much more
> > > > > implications?
> > > > > 
> > > > > - Sebastian
> > > > > 
> > > > > 
> > > > > > (and scalar ints) are broadcast together, *except* for
> > > > > > boolean
> > > > > > scalars. Then the first boolean scalar is replaced with
> > > > > > and(all
> > > > > > boolean scalars) and the rest are removed from the index.
> > > > > > Then
> > > > > > that
> > > > > > index adds a length 1 axis if it is True and 0 if it is
> > > > > > False.
> > > > > > 
> > > > > > So they don't broadcast, but rather "fake broadcast". I
> > > > > > still
> > > > > > contend
> > > > > > that it would be much more useful, if True were a synonym
> > > > > > for
> > > > > > newaxis
> > > > > > and False worked like newaxis but instead added a length 0
> > > > > > axis.
> > > > > > Alternately, True and False scalars should behave exactly
> > > > > > like
> > > > > > all
> > > > > > other boolean arrays with no exceptions (i.e., work like
> > > > > > np.nonzero(),
> > > > > > broadcast, etc.). This would be less useful, but more
> > > > > > consistent.
> > > > > > 
> > > > > > Aaron Meurer
> > > > > > _______________________________________________
> > > > > > NumPy-Discussion mailing list
> > > > > > NumPy-Discussion@python.org
> > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > NumPy-Discussion mailing list
> > > > > NumPy-Discussion@python.org
> > > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion@python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] What is up with raw boolean indices (like a[False])?

Reply via email to