On Mon, Apr 6, 2015 at 4:49 PM, Nicholas Devenish <misno...@gmail.com> wrote: > With the indexing example from the documentation: > > y = np.arange(35).reshape(5,7) > > Why does selecting an item from explicitly every row work as I’d expect: >>>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])] > array([ 0, 7, 14, 21, 28]) > > But doing so from a full slice (which, I would naively expect to mean “Every > Row”) has some…other… behaviour: > >>>> y[:,np.array([0,0,0,0,0])] > array([[ 0, 0, 0, 0, 0], > [ 7, 7, 7, 7, 7], > [14, 14, 14, 14, 14], > [21, 21, 21, 21, 21], > [28, 28, 28, 28, 28]]) > > What is going on in this example, and how do I get what I expect? By > explicitly passing in an extra array with value===index? What is the > rationale for this difference in behaviour? >
To understand this example, it is important to understand that for multi-dimensional arrays, Numpy attempts to make the index array along each dimension the same size, using broadcasting. So in your original example, y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])], the arrays are the same size, and the behavior is as you'd expect. In the second case, the first index is a slice, and the second index is an array. Documentation for this case can be found in the indexing docs under "Combining index arrays with slices". Here's the relevant portion: > In effect, the slice is converted to an [new] index array ... that is > broadcast with the [other] index array So in your case, the slice ":" is *first* being converted to np.arange(5), *then* is broadcast across the shape of the [other] index array so that it is ultimately transformed into something like np.repeat(np.arange(5)[:,np.newaxis], 5, axis=1), giving you: array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) Now at this point you have converted your slice to an [new] index array of shape (5,5), and your [other] index array is shaped (5,). So now numpy applies broadcasting rules to the second array to get it into shape 5. This operation is identical to what just occurred, so your [other] index array *also* looks like: array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) Which then gives the result you saw. Now, you may say: once the slice was converted to np.arange(5), why was it then broadcast to shape (5,5) rather than kept at shape (5,) which would work. The reason (I suspect at least) is to keep it consistent with other types of slices. Consider if you did something like: y[1:3, np.array([0,0,0,0,0])] Then the same operation would apply as above, except that when the slice was converted to an array, it would be converted to np.arange(1,3) which has shape (2,). Obviously this isn't compatible with the second index array of shape (5,), so it *has* to be broadcast. One final note: in this case, you can instead use either of the following: y[np.array([0,1,2,3,4]), 0] or y[:, 0] using the same steps above, the slice is converted to an np.arange(5), and then the shapes are compared, (5,) versus (). Then the integer index is broadcast to shape (5,) which gives you what you want. Hope that helps. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion