Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Charles R Harris
On Thu, Jun 4, 2015 at 6:26 PM, Charles R Harris 
wrote:

> Hi All,
>
> I've not strong feelings one way or the other on this proposed deprecation
> for numpy 1.10 and would like some feedback from interested users.
>

Umm, link is #4353 .

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Nathaniel Smith
So specifically the question is -- if you have an array with five items,
and a Boolean array with three items, then currently you can use the later
to index the former:

arr = np.arange(5)
mask = np.asarray([True, False, True])
arr[mask] # returns array([0, 2])

This is justified by the rule that indexing with a Boolean array should be
the same as indexing with the same array that's been passed to
np.nonzero(). Empirically, though, this causes constant confusion and does
not seen very useful, so the question is whether we should deprecate it.

-n
On Jun 4, 2015 5:30 PM, "Charles R Harris" 
wrote:

>
>
> On Thu, Jun 4, 2015 at 6:26 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've not strong feelings one way or the other on this proposed
>> deprecation for numpy 1.10 and would like some feedback from interested
>> users.
>>
>
> Umm, link is #4353 .
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Nathaniel Smith
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith  wrote:
> So specifically the question is -- if you have an array with five items,
and
> a Boolean array with three items, then currently you can use the later to
> index the former:
>
> arr = np.arange(5)
> mask = np.asarray([True, False, True])
> arr[mask] # returns array([0, 2])
>
> This is justified by the rule that indexing with a Boolean array should be
> the same as indexing with the same array that's been passed to
np.nonzero().
> Empirically, though, this causes constant confusion and does not seen very
> useful, so the question is whether we should deprecate it.

One place where the current behavior is particularly baffling and annoying
is when you have multiple boolean masks in the same indexing operation. I
think everyone would expect this to index separately on each axis ("outer
product indexing" style, like slices do), and that's really the only useful
interpretation, but that's not what it does...:

In [3]: a = np.arange(9).reshape((3, 3))

In [4]: a
Out[4]:
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])

In [6]: a[np.asarray([True, False, True]), np.asarray([False, True, True])]
Out[6]: array([1, 8])

In [7]: a[np.asarray([True, False, True]), np.asarray([False, False, True])]
Out[7]: array([2, 8])

In [8]: a[np.asarray([True, False, True]), np.asarray([True, True, True])]
---
IndexErrorTraceback (most recent call last)
 in ()
> 1 a[np.asarray([True, False, True]), np.asarray([True, True, True])]

IndexError: shape mismatch: indexing arrays could not be broadcast together
with shapes (2,) (3,)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Benjamin Root
On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith  wrote:

> On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith  wrote:
>
> One place where the current behavior is particularly baffling and annoying
> is when you have multiple boolean masks in the same indexing operation. I
> think everyone would expect this to index separately on each axis ("outer
> product indexing" style, like slices do), and that's really the only useful
> interpretation, but that's not what it does...:
>
>
As a huge user of boolean indexes, I have never expected this to work in
any way, shape or form. I don't think it works in matlab (but someone
should probably check that), so you wouldn't have to worry about converts
missing a feature from there. I have always been told that boolean indexing
will produce a flattened array, and I wouldn't want to be dealing with
magic when the array does not match up right.

Now, what if the boolean array is broadcastable (dimension-wise, not
length-wise)? I do see some uses there. Modulo that, my vote is to
deprecate.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Nathaniel Smith
On Thu, Jun 4, 2015 at 6:22 PM, Benjamin Root  wrote:
>
> On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith  wrote:
>>
>> On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith  wrote:
>>
>> One place where the current behavior is particularly baffling and annoying 
>> is when you have multiple boolean masks in the same indexing operation. I 
>> think everyone would expect this to index separately on each axis ("outer 
>> product indexing" style, like slices do), and that's really the only useful 
>> interpretation, but that's not what it does...:
>>
>
> As a huge user of boolean indexes, I have never expected this to work in any 
> way, shape or form. I don't think it works in matlab (but someone should 
> probably check that), so you wouldn't have to worry about converts missing a 
> feature from there. I have always been told that boolean indexing will 
> produce a flattened array, and I wouldn't want to be dealing with magic when 
> the array does not match up right.

Note that there are two types of boolean indexing:

type 1: arr[mask] where mask is n-d (ideally the same shape as "arr",
but I think that it *is* broadcast if not). This always produces 1-d
output.

type 2: arr[..., mask, ...], where mask is 1-d and only applies to the
given dimension.

My comment was about the second type. Are your comments about the
second type? The second type definitely does not produce a flattened
array:

In [7]: a = np.arange(9).reshape(3, 3)

In [8]: a[np.asarray([True, False, True]), :]
Out[8]:
array([[0, 1, 2],
   [6, 7, 8]])

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Sebastian Berg
On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
> On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith  wrote:
> > So specifically the question is -- if you have an array with five
> items, and
> > a Boolean array with three items, then currently you can use the
> later to
> > index the former:
> >
> > arr = np.arange(5)
> > mask = np.asarray([True, False, True])
> > arr[mask] # returns array([0, 2])
> >
> > This is justified by the rule that indexing with a Boolean array
> should be
> > the same as indexing with the same array that's been passed to
> np.nonzero().
> > Empirically, though, this causes constant confusion and does not
> seen very
> > useful, so the question is whether we should deprecate it.
> 
> One place where the current behavior is particularly baffling and
> annoying is when you have multiple boolean masks in the same indexing
> operation. I think everyone would expect this to index separately on
> each axis ("outer product indexing" style, like slices do), and that's
> really the only useful interpretation, but that's not what it does...:


This is not being deprecated in there for the moment, it is a different
discussion. Though maybe we can improve the error message to mention
that the array was originally boolean, has always been bugging me a bit
(it used to mention for some cases it is not anymore).

- Sebastian


> In [3]: a = np.arange(9).reshape((3, 3))
> 
> In [4]: a
> Out[4]:
> array([[0, 1, 2],
>[3, 4, 5],
>[6, 7, 8]])
> 
> In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
> True])]
> Out[6]: array([1, 8])
> 
> In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
> True])]
> Out[7]: array([2, 8])
> 
> In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
> True])]
> ---
> IndexErrorTraceback (most recent call
> last)
>  in ()
> > 1 a[np.asarray([True, False, True]), np.asarray([True, True,
> True])]
> 
> IndexError: shape mismatch: indexing arrays could not be broadcast
> together with shapes (2,) (3,)
> 
> 
> -n
> 
> -- 
> Nathaniel J. Smith -- http://vorpus.org
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 3:16 AM, Sebastian Berg 
wrote:

> On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
> > On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith  wrote:
> > > So specifically the question is -- if you have an array with five
> > items, and
> > > a Boolean array with three items, then currently you can use the
> > later to
> > > index the former:
> > >
> > > arr = np.arange(5)
> > > mask = np.asarray([True, False, True])
> > > arr[mask] # returns array([0, 2])
> > >
> > > This is justified by the rule that indexing with a Boolean array
> > should be
> > > the same as indexing with the same array that's been passed to
> > np.nonzero().
> > > Empirically, though, this causes constant confusion and does not
> > seen very
> > > useful, so the question is whether we should deprecate it.
> >
> > One place where the current behavior is particularly baffling and
> > annoying is when you have multiple boolean masks in the same indexing
> > operation. I think everyone would expect this to index separately on
> > each axis ("outer product indexing" style, like slices do), and that's
> > really the only useful interpretation, but that's not what it does...:
>
>
> This is not being deprecated in there for the moment, it is a different
> discussion. Though maybe we can improve the error message to mention
> that the array was originally boolean, has always been bugging me a bit
> (it used to mention for some cases it is not anymore).
>
> - Sebastian
>
>
> > In [3]: a = np.arange(9).reshape((3, 3))
> >
> > In [4]: a
> > Out[4]:
> > array([[0, 1, 2],
> >[3, 4, 5],
> >[6, 7, 8]])
> >
> > In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
> > True])]
> > Out[6]: array([1, 8])
> >
> > In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
> > True])]
> > Out[7]: array([2, 8])
> >
> > In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
> > True])]
> >
> ---
> > IndexErrorTraceback (most recent call
> > last)
> >  in ()
> > > 1 a[np.asarray([True, False, True]), np.asarray([True, True,
> > True])]
> >
> > IndexError: shape mismatch: indexing arrays could not be broadcast
> > together with shapes (2,) (3,)
> >
> >
> > -n
> >
> > --
> > Nathaniel J. Smith -- http://vorpus.org
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

What is actually being deprecated?
It looks like there are different examples.

wrong length: Nathaniels first example above, where the mask is not
broadcastable to original array because mask is longer or shorter than
shape[axis].
I also wouldn't have expected this to work, although I use np.nozero and
boolean mask indexing interchangeably, I would assume we need the correct
length for the mask.

The second case where the boolean mask has an extra dimension of length
one, or several boolean arrays might need more checking.
I'm pretty sure I used various version, assuming they are a feature, and
when I see arrays, I usually don't assume "outer product indexing"  (that
might lead to a similar discussion as the recent fancy versus orthogonal
indexing)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Sebastian Berg
On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
> 

> 
> What is actually being deprecated?
> It looks like there are different examples.
> 
> 
> wrong length: Nathaniels first example above, where the mask is not
> broadcastable to original array because mask is longer or shorter than
> shape[axis].
> I also wouldn't have expected this to work, although I use np.nozero
> and boolean mask indexing interchangeably, I would assume we need the
> correct length for the mask.
> 

For the moment we are only talking about wrong length (along a given
dimension). Not about wrong number of dimensions or multiple boolean
indices.
As a side note: I don't think the single boolean index behaviour needs
change, it is ok. Yes, it is not quite broadcasting, but there is no
help considering transparent multidimensional indexing.
As for multiple booleans, I think is more part of the "outer" indexing
discussion, which is interesting but not here :).

- Sebastian


> 
> The second case where the boolean mask has an extra dimension of
> length one, or several boolean arrays might need more checking.
> I'm pretty sure I used various version, assuming they are a feature,
> and when I see arrays, I usually don't assume "outer product
> indexing"  (that might lead to a similar discussion as the recent
> fancy versus orthogonal indexing)
> 
> 
> 
> 
> Josef
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Benjamin Root
On Thu, Jun 4, 2015 at 10:41 PM, Nathaniel Smith  wrote:

> My comment was about the second type. Are your comments about the
> second type? The second type definitely does not produce a flattened
> array:
>


I was talking about the second type in that I never even knew it existed.
My understanding of boolean indexing has always been that it flattens, so
the second type is a surprise to me.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Anne Archibald
On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg 
wrote:

> On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
> >
> 
> >
> > What is actually being deprecated?
> > It looks like there are different examples.
> >
> >
> > wrong length: Nathaniels first example above, where the mask is not
> > broadcastable to original array because mask is longer or shorter than
> > shape[axis].
> > I also wouldn't have expected this to work, although I use np.nozero
> > and boolean mask indexing interchangeably, I would assume we need the
> > correct length for the mask.
> >
>
> For the moment we are only talking about wrong length (along a given
> dimension). Not about wrong number of dimensions or multiple boolean
> indices.
>

I am pro-deprecation then, definitely. I don't see a use case for padding a
wrong-shaped boolean array with Falses, and the padding has burned me in
the past.

It's not orthogonal to the wrong-number-of-dimensions issue, though,
because if your Boolean array has a dimension of length 1, broadcasting
says duplicate it along that axis to match the indexee, and wrong-length
says pad it with Falses. This ambiguity/pitfall disappears if the padding
never happens, and that kind of broadcasting is very useful.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 11:50 AM, Anne Archibald  wrote:

>
>
> On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg 
> wrote:
>
>> On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
>> >
>> 
>> >
>> > What is actually being deprecated?
>> > It looks like there are different examples.
>> >
>> >
>> > wrong length: Nathaniels first example above, where the mask is not
>> > broadcastable to original array because mask is longer or shorter than
>> > shape[axis].
>> > I also wouldn't have expected this to work, although I use np.nozero
>> > and boolean mask indexing interchangeably, I would assume we need the
>> > correct length for the mask.
>> >
>>
>> For the moment we are only talking about wrong length (along a given
>> dimension). Not about wrong number of dimensions or multiple boolean
>> indices.
>>
>
> I am pro-deprecation then, definitely. I don't see a use case for padding
> a wrong-shaped boolean array with Falses, and the padding has burned me in
> the past.
>
> It's not orthogonal to the wrong-number-of-dimensions issue, though,
> because if your Boolean array has a dimension of length 1, broadcasting
> says duplicate it along that axis to match the indexee, and wrong-length
> says pad it with Falses. This ambiguity/pitfall disappears if the padding
> never happens, and that kind of broadcasting is very useful.
>

Good argument, now I understand why we only get a single column



>>> x = np.arange(4*5).reshape(4,5)
>>> mask = np.array([1,0,1,0,1], bool)

padding with False, this would also be deprecated AFAIU, and Anna pointed
out

>>> x[mask[:4][:,None]]
array([ 0, 10])
>>> x[mask[None,:]]
array([0, 2, 4])

masks can only be combined with slices, so no "fancy masking" allowed nor
defined (yet)

>>> x[mask[:4][:,None], mask[None,:]]
Traceback (most recent call last):
  File "", line 1, in 
x[mask[:4][:,None], mask[None,:]]
IndexError: too many indices for array


I'm using 1d masks quite often to select rows or columns, which seems to
work in more than two dimensions
(Benjamin's surprise)

>>> x[:, mask]
array([[ 0,  2,  4],
   [ 5,  7,  9],
   [10, 12, 14],
   [15, 17, 19]])

>>> x[mask[:4][:,None] * mask[None,:]]
array([ 0,  2,  4, 10, 12, 14])
>>> x[:,:,None][mask[:4][:,None] * mask[None,:]]
array([[ 0],
   [ 2],
   [ 4],
   [10],
   [12],
   [14]])

Josef



>
> Anne
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion