Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Sebastian Berg
On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
 On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:
  So specifically the question is -- if you have an array with five
 items, and
  a Boolean array with three items, then currently you can use the
 later to
  index the former:
 
  arr = np.arange(5)
  mask = np.asarray([True, False, True])
  arr[mask] # returns array([0, 2])
 
  This is justified by the rule that indexing with a Boolean array
 should be
  the same as indexing with the same array that's been passed to
 np.nonzero().
  Empirically, though, this causes constant confusion and does not
 seen very
  useful, so the question is whether we should deprecate it.
 
 One place where the current behavior is particularly baffling and
 annoying is when you have multiple boolean masks in the same indexing
 operation. I think everyone would expect this to index separately on
 each axis (outer product indexing style, like slices do), and that's
 really the only useful interpretation, but that's not what it does...:


This is not being deprecated in there for the moment, it is a different
discussion. Though maybe we can improve the error message to mention
that the array was originally boolean, has always been bugging me a bit
(it used to mention for some cases it is not anymore).

- Sebastian


 In [3]: a = np.arange(9).reshape((3, 3))
 
 In [4]: a
 Out[4]:
 array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
 
 In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
 True])]
 Out[6]: array([1, 8])
 
 In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
 True])]
 Out[7]: array([2, 8])
 
 In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
 True])]
 ---
 IndexErrorTraceback (most recent call
 last)
 ipython-input-8-30b3427bec2a in module()
  1 a[np.asarray([True, False, True]), np.asarray([True, True,
 True])]
 
 IndexError: shape mismatch: indexing arrays could not be broadcast
 together with shapes (2,) (3,)
 
 
 -n
 
 -- 
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 3:16 AM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Do, 2015-06-04 at 18:04 -0700, Nathaniel Smith wrote:
  On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:
   So specifically the question is -- if you have an array with five
  items, and
   a Boolean array with three items, then currently you can use the
  later to
   index the former:
  
   arr = np.arange(5)
   mask = np.asarray([True, False, True])
   arr[mask] # returns array([0, 2])
  
   This is justified by the rule that indexing with a Boolean array
  should be
   the same as indexing with the same array that's been passed to
  np.nonzero().
   Empirically, though, this causes constant confusion and does not
  seen very
   useful, so the question is whether we should deprecate it.
 
  One place where the current behavior is particularly baffling and
  annoying is when you have multiple boolean masks in the same indexing
  operation. I think everyone would expect this to index separately on
  each axis (outer product indexing style, like slices do), and that's
  really the only useful interpretation, but that's not what it does...:


 This is not being deprecated in there for the moment, it is a different
 discussion. Though maybe we can improve the error message to mention
 that the array was originally boolean, has always been bugging me a bit
 (it used to mention for some cases it is not anymore).

 - Sebastian


  In [3]: a = np.arange(9).reshape((3, 3))
 
  In [4]: a
  Out[4]:
  array([[0, 1, 2],
 [3, 4, 5],
 [6, 7, 8]])
 
  In [6]: a[np.asarray([True, False, True]), np.asarray([False, True,
  True])]
  Out[6]: array([1, 8])
 
  In [7]: a[np.asarray([True, False, True]), np.asarray([False, False,
  True])]
  Out[7]: array([2, 8])
 
  In [8]: a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
 ---
  IndexErrorTraceback (most recent call
  last)
  ipython-input-8-30b3427bec2a in module()
   1 a[np.asarray([True, False, True]), np.asarray([True, True,
  True])]
 
  IndexError: shape mismatch: indexing arrays could not be broadcast
  together with shapes (2,) (3,)
 
 
  -n
 
  --
  Nathaniel J. Smith -- http://vorpus.org
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



What is actually being deprecated?
It looks like there are different examples.

wrong length: Nathaniels first example above, where the mask is not
broadcastable to original array because mask is longer or shorter than
shape[axis].
I also wouldn't have expected this to work, although I use np.nozero and
boolean mask indexing interchangeably, I would assume we need the correct
length for the mask.

The second case where the boolean mask has an extra dimension of length
one, or several boolean arrays might need more checking.
I'm pretty sure I used various version, assuming they are a feature, and
when I see arrays, I usually don't assume outer product indexing  (that
might lead to a similar discussion as the recent fancy versus orthogonal
indexing)


Josef
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread josef.pktd
On Fri, Jun 5, 2015 at 11:50 AM, Anne Archibald archib...@astron.nl wrote:



 On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg sebast...@sipsolutions.net
 wrote:

 On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
 
 snip
 
  What is actually being deprecated?
  It looks like there are different examples.
 
 
  wrong length: Nathaniels first example above, where the mask is not
  broadcastable to original array because mask is longer or shorter than
  shape[axis].
  I also wouldn't have expected this to work, although I use np.nozero
  and boolean mask indexing interchangeably, I would assume we need the
  correct length for the mask.
 

 For the moment we are only talking about wrong length (along a given
 dimension). Not about wrong number of dimensions or multiple boolean
 indices.


 I am pro-deprecation then, definitely. I don't see a use case for padding
 a wrong-shaped boolean array with Falses, and the padding has burned me in
 the past.

 It's not orthogonal to the wrong-number-of-dimensions issue, though,
 because if your Boolean array has a dimension of length 1, broadcasting
 says duplicate it along that axis to match the indexee, and wrong-length
 says pad it with Falses. This ambiguity/pitfall disappears if the padding
 never happens, and that kind of broadcasting is very useful.


Good argument, now I understand why we only get a single column



 x = np.arange(4*5).reshape(4,5)
 mask = np.array([1,0,1,0,1], bool)

padding with False, this would also be deprecated AFAIU, and Anna pointed
out

 x[mask[:4][:,None]]
array([ 0, 10])
 x[mask[None,:]]
array([0, 2, 4])

masks can only be combined with slices, so no fancy masking allowed nor
defined (yet)

 x[mask[:4][:,None], mask[None,:]]
Traceback (most recent call last):
  File pyshell#31, line 1, in module
x[mask[:4][:,None], mask[None,:]]
IndexError: too many indices for array


I'm using 1d masks quite often to select rows or columns, which seems to
work in more than two dimensions
(Benjamin's surprise)

 x[:, mask]
array([[ 0,  2,  4],
   [ 5,  7,  9],
   [10, 12, 14],
   [15, 17, 19]])

 x[mask[:4][:,None] * mask[None,:]]
array([ 0,  2,  4, 10, 12, 14])
 x[:,:,None][mask[:4][:,None] * mask[None,:]]
array([[ 0],
   [ 2],
   [ 4],
   [10],
   [12],
   [14]])

Josef




 Anne

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Anne Archibald
On Fri, Jun 5, 2015 at 5:45 PM Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
 
 snip
 
  What is actually being deprecated?
  It looks like there are different examples.
 
 
  wrong length: Nathaniels first example above, where the mask is not
  broadcastable to original array because mask is longer or shorter than
  shape[axis].
  I also wouldn't have expected this to work, although I use np.nozero
  and boolean mask indexing interchangeably, I would assume we need the
  correct length for the mask.
 

 For the moment we are only talking about wrong length (along a given
 dimension). Not about wrong number of dimensions or multiple boolean
 indices.


I am pro-deprecation then, definitely. I don't see a use case for padding a
wrong-shaped boolean array with Falses, and the padding has burned me in
the past.

It's not orthogonal to the wrong-number-of-dimensions issue, though,
because if your Boolean array has a dimension of length 1, broadcasting
says duplicate it along that axis to match the indexee, and wrong-length
says pad it with Falses. This ambiguity/pitfall disappears if the padding
never happens, and that kind of broadcasting is very useful.

Anne
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Sebastian Berg
On Fr, 2015-06-05 at 08:36 -0400, josef.p...@gmail.com wrote:
 
snip
 
 What is actually being deprecated?
 It looks like there are different examples.
 
 
 wrong length: Nathaniels first example above, where the mask is not
 broadcastable to original array because mask is longer or shorter than
 shape[axis].
 I also wouldn't have expected this to work, although I use np.nozero
 and boolean mask indexing interchangeably, I would assume we need the
 correct length for the mask.
 

For the moment we are only talking about wrong length (along a given
dimension). Not about wrong number of dimensions or multiple boolean
indices.
As a side note: I don't think the single boolean index behaviour needs
change, it is ok. Yes, it is not quite broadcasting, but there is no
help considering transparent multidimensional indexing.
As for multiple booleans, I think is more part of the outer indexing
discussion, which is interesting but not here :).

- Sebastian


 
 The second case where the boolean mask has an extra dimension of
 length one, or several boolean arrays might need more checking.
 I'm pretty sure I used various version, assuming they are a feature,
 and when I see arrays, I usually don't assume outer product
 indexing  (that might lead to a similar discussion as the recent
 fancy versus orthogonal indexing)
 
 
 
 
 Josef
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-05 Thread Benjamin Root
On Thu, Jun 4, 2015 at 10:41 PM, Nathaniel Smith n...@pobox.com wrote:

 My comment was about the second type. Are your comments about the
 second type? The second type definitely does not produce a flattened
 array:



I was talking about the second type in that I never even knew it existed.
My understanding of boolean indexing has always been that it flattens, so
the second type is a surprise to me.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Benjamin Root
On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:

 One place where the current behavior is particularly baffling and annoying
 is when you have multiple boolean masks in the same indexing operation. I
 think everyone would expect this to index separately on each axis (outer
 product indexing style, like slices do), and that's really the only useful
 interpretation, but that's not what it does...:


As a huge user of boolean indexes, I have never expected this to work in
any way, shape or form. I don't think it works in matlab (but someone
should probably check that), so you wouldn't have to worry about converts
missing a feature from there. I have always been told that boolean indexing
will produce a flattened array, and I wouldn't want to be dealing with
magic when the array does not match up right.

Now, what if the boolean array is broadcastable (dimension-wise, not
length-wise)? I do see some uses there. Modulo that, my vote is to
deprecate.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Nathaniel Smith
On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:
 So specifically the question is -- if you have an array with five items,
and
 a Boolean array with three items, then currently you can use the later to
 index the former:

 arr = np.arange(5)
 mask = np.asarray([True, False, True])
 arr[mask] # returns array([0, 2])

 This is justified by the rule that indexing with a Boolean array should be
 the same as indexing with the same array that's been passed to
np.nonzero().
 Empirically, though, this causes constant confusion and does not seen very
 useful, so the question is whether we should deprecate it.

One place where the current behavior is particularly baffling and annoying
is when you have multiple boolean masks in the same indexing operation. I
think everyone would expect this to index separately on each axis (outer
product indexing style, like slices do), and that's really the only useful
interpretation, but that's not what it does...:

In [3]: a = np.arange(9).reshape((3, 3))

In [4]: a
Out[4]:
array([[0, 1, 2],
   [3, 4, 5],
   [6, 7, 8]])

In [6]: a[np.asarray([True, False, True]), np.asarray([False, True, True])]
Out[6]: array([1, 8])

In [7]: a[np.asarray([True, False, True]), np.asarray([False, False, True])]
Out[7]: array([2, 8])

In [8]: a[np.asarray([True, False, True]), np.asarray([True, True, True])]
---
IndexErrorTraceback (most recent call last)
ipython-input-8-30b3427bec2a in module()
 1 a[np.asarray([True, False, True]), np.asarray([True, True, True])]

IndexError: shape mismatch: indexing arrays could not be broadcast together
with shapes (2,) (3,)

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Nathaniel Smith
On Thu, Jun 4, 2015 at 6:22 PM, Benjamin Root ben.r...@ou.edu wrote:

 On Thu, Jun 4, 2015 at 9:04 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Jun 4, 2015 at 5:57 PM, Nathaniel Smith n...@pobox.com wrote:

 One place where the current behavior is particularly baffling and annoying 
 is when you have multiple boolean masks in the same indexing operation. I 
 think everyone would expect this to index separately on each axis (outer 
 product indexing style, like slices do), and that's really the only useful 
 interpretation, but that's not what it does...:


 As a huge user of boolean indexes, I have never expected this to work in any 
 way, shape or form. I don't think it works in matlab (but someone should 
 probably check that), so you wouldn't have to worry about converts missing a 
 feature from there. I have always been told that boolean indexing will 
 produce a flattened array, and I wouldn't want to be dealing with magic when 
 the array does not match up right.

Note that there are two types of boolean indexing:

type 1: arr[mask] where mask is n-d (ideally the same shape as arr,
but I think that it *is* broadcast if not). This always produces 1-d
output.

type 2: arr[..., mask, ...], where mask is 1-d and only applies to the
given dimension.

My comment was about the second type. Are your comments about the
second type? The second type definitely does not produce a flattened
array:

In [7]: a = np.arange(9).reshape(3, 3)

In [8]: a[np.asarray([True, False, True]), :]
Out[8]:
array([[0, 1, 2],
   [6, 7, 8]])

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] DEP: Deprecate boolean array indices with non-matching shape #4353

2015-06-04 Thread Charles R Harris
On Thu, Jun 4, 2015 at 6:26 PM, Charles R Harris charlesr.har...@gmail.com
wrote:

 Hi All,

 I've not strong feelings one way or the other on this proposed deprecation
 for numpy 1.10 and would like some feedback from interested users.


Umm, link is #4353 https://github.com/numpy/numpy/pull/4353.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion