On May 14, 2012, at 7:07 PM, Stéfan van der Walt wrote:

> Hi Zach
> 
> On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus <zachary.pin...@yale.edu> 
> wrote:
>> The below seems to be a bug, but perhaps it's unavoidably part of the 
>> indexing mechanism?
>> 
>> It's easiest to show via example... note that using "[0,1]" to pull two 
>> columns out of the array gives the same shape as using ":2" in the simple 
>> case, but when there's additional slicing happening, the shapes get 
>> transposed or something.
> 
> When fancy indexing and slicing is mixed, the resulting shape is
> essentially unpredictable.  The "correct" way to do it is to only use
> fancy indexing, i.e. generate the indices of the sliced dimension as
> well.

This is not quite accurate.   It is not unpredictable.  It is very predictable, 
but a bit (too) complicated in the most general case.  The problem occurs when 
you "intermingle" fancy indexing with slice notation (and for this purpose 
integer selection is considered "fancy-indexing").   While in simple cases you 
can think that [0,1] is equivalent to :2 --- it is not because fancy-indexing 
uses "zip-based ideas" instead of cross-product based ideas.   

The problem in general is how to make sense of something like

a[:, :, in1, in2]   

If you keep fancy indexing to one side of the slice notation only, then you get 
what you expect.   The shape of the output will be the first two dimensions of 
a + the broadcasted shape of in1 and in2 (where integers are interpreted as 
fancy-index arrays). 

So, let's say a is (10,9,8,7)  and in1 is (3,4) and in2 is (4,)

The shape of the output will be (10,9,3,4) filled with essentially a[:,:,i,j] = 
a[:,:,in1[i,j], in2[j]]

What happens, though when you have

a[:, in1 :, in2]? 

in1 and in2 are broadcasted together to create a two-dimensional "sub-space" 
that must fit somewhere.   Where should it go?   Should it replace in1 or in2?  
  I.e. should the output be 

(10,3,4,8) or (10,8,3,4).  

To "resolve" this ambiguity, the code sends the (3,4) sub-space to the front of 
the "dimensions" and returns (3,4,10,8).   In retro-spect, the code should 
raise an error as I doubt anyone actually relies on this behavior, and then we 
could have "done the right" thing for situations like in1 being an integer 
which actually makes some sense and should not have been confused with the 
"general case"  

In this particular case you might also think that we could say the result 
should be (10,3,8,4) but there is no guarantee that the number of dimensions 
that should be appended by the "fancy-indexing" objects will be the same as the 
number of dimensions replaced.    Again, this is how fancy-indexing combines 
with other fancy-indexing objects. 

So, the behavior is actually quite predictable, it's just that in some common 
cases it doesn't do what you would expect --- especially if you think that 
[0,1] is "the same" as :2.   When I wrote this code to begin with I should have 
raised an error and then worked in the cases that make sense.    This is a good 
example of making the mistake of thinking that it's better to provide something 
very general rather than just raise an error when an obvious and clear solution 
is not available.  

There is the possibility that we could now raise an error in NumPy when this 
situation is encountered because I strongly doubt anyone is actually relying on 
the current behavior.    I would like to do this, actually, as soon as 
possible.  Comments? 

-Travis




> 
> Stéfan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to