Hey, small update on this. The NEP draft has not changed much, but you can now try the full power of the proposed new indexing types [1]:
* arr.oindex[...] # orthogonal/outer indexing * arr.vindex[...] # vectorized (like fancy, but different ;)) * arr.lindex[...] # legacy/fancy indexing with my pull request at https://github.com/numpy/numpy/pull/6075 You can try it locally by cloning the numpy github repository and then running from the source dir: git fetch upstream pull/6075/head:pr-6075 && git checkout pr-6075; python runtests.py --ipython # Inside ipython: import warnings; warnings.simplefilter("always") The examples from the NEP at should all run fine, you can find the NEP draft at: https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9#diff-01e4dd9d2ecf994b24e5883f98f789e6 I would be most happy about any comments or suggestions! - Sebastian [1] Modulo possible bugs, there is not test suit yet.... On Mi, 2015-11-11 at 11:02 +0100, Sebastian Berg wrote: > Hi all, > > at scipy discussing with Nathaniel and others, we thought that maybe we > can push for orthogonal type indexing into numpy. Now with the new > version out and some other discussions done, I thought it is time to > pick it up :). > > The basic ideas are twofold. First make indexing easier and less > confusing for starters (and advanced users also), and second improve > interoperability with projects such as xray for whom orthogonal/outer > type indexing makes more sense. > > I have started working on: > > 1. A preliminary draft of an NEP you can view at > https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9#diff-01e4dd9d2ecf994b24e5883f98f789e6 > or at the end of this mail. > > 2. A preliminary implementation of `oindex` attribute with > orthogonal/outer style indexing in > https://github.com/numpy/numpy/pull/6075 which you can try out by > cloning numpy and then running from the source dir: > > git fetch upstream pull/6075/head:pr-6075 && git checkout pr-6075; > python runtests.py --ipython > > This will fetch my PR, switch to the branch and open an interactive > ipython shell where you will be able to do arr.oindex[]. > > > Note that I consider the NEP quite preliminary in many parts, and it may > still be very confusing unless you are well versed with current advanced > indexing. There are some longer examples comparing the different styles > and another "example" which tries to show a "use case" example going > from simpler to more complex indexing operations. > Any comments are very welcome, and if it is "I don't understand a > word" :). I know it is probably too short and, at least without > examples, not easy to understand. > > Best, > > Sebastian > > > ================================================================================== > The current NEP draft: > > > ========================================================== > Implementing intuitive and full featured advanced indexing > ========================================================== > > :Author: Sebastian Berg > :Date: 2015-08-27 > :Status: draft > > > Executive summary > ================= > > Advanced indexing with multiple array indices is typically confusing to > both new, and in many cases even old, users of NumPy. To avoid this > problem > and allow for more and clearer features, we propose to: > > 1. Introduce ``arr.oindex[indices]`` which allows advanced indices, but > uses outer indexing logic. > 2. Introduce ``arr.vindex[indices]`` which use the current > "vectorized"/broadcasted logic but with two differences from > fancy indexing: > > 1. Boolean indices always use the outer indexing logic. > (Multi dimensional booleans should be allowed). > 2. The integer index result dimensions are always the first axes > of the result array. No transpose is done, even for a single > integer array index. > > 3. Vanilla indexing on the array will only give warnings and eventually > errors either: > > * when there is ambiguity between legacy fancy and outer indexing > (note that ``arr[[1, 2], :, 0]`` is such a case, an integer > can be the "second" integer index array), > * when any integer index array is present (possibly additional for > more then one boolean index array). > > These constraints are sufficient for making indexing generally > consistent > with expectations and providing a less surprising learning curve with > ``oindex``. > > Note that all things mentioned here apply both for assignment as well as > subscription. > > Understanding these details is *not* easy. The `Examples` section gives > code > examples. And the hopefully easier `Motivational Example` provides some > motivational use-cases for the general ideas and is likely a good start > for > anyone not intimately familiar with advanced indexing. > > > Motivation > ========== > > Old style advanced indexing with multiple array (boolean or integer) > indices, > also called "fancy indexing", tends to be very confusing for new users. > While fancy (or legacy) indexing is useful in many cases one would > naively > assume that the result of multiple 1-d ranges is analogous to multiple > slices along each dimension (also called "outer indexing"). > > However, legacy fancy indexing with multiple arrays broadcasts these > arrays > into a single index over multiple dimensions. There are three main > points > of confusion when multiple array indices are involved: > > 1. Most new users will usually expect outer indexing (consistent with > slicing). This is also the most common way of handling this in other > packages or languages. > 2. The axes introduced by the array indices are at the front, unless > all array indices are consecutive, in which case one can deduce where > the user "expects" them to be: > > * `arr[:, [0, 1], :, [0, 1]]` will have the first dimension shaped 2. > * `arr[:, [0, 1], [0, 1]]` will have the second dimension shaped 2. > > 3. When a boolean array index is mixed with another boolean or integer > array, the result is very hard to understand (the boolean array is > converted to integer array indices and then broadcast), and hardly > useful. > There is no well defined broadcast for booleans, so that boolean > indices are logically always "``outer``" type indices. > > > Proposed rules > ============== > > From the three problems noted above some expectations for NumPy can > be deduced: > > 1. There should be a prominent outer/orthogonal indexing method such as > ``arr.oindex[indices]``. > 2. Considering how confusing fancy indexing can be, it should only > occur explicitly (e.g. ``arr.vindex[indices]``) > 3. A new ``arr.vindex[indices]`` method, would not be tied to the > confusing transpose rules of fancy indexing (which is for example > needed for the simple case of a single advanced index). Thus, it > no transposing should be done. The axes of the advanced indices are > always inserted at the front, even for a single index. > 4. Boolean indexing is conceptionally outer indexing. A broadcasting > together with other advanced indices in the manner of legacy > "fancy indexing" is generally not helpful or well defined. > A user who wishes the "``nonzero``" plus broadcast behaviour can thus > be expected to do this manually. > Using this rule, a single boolean index can index into multiple > dimensions at once. > 5. An ``arr.lindex`` or ``arr.findex`` should likely be implemented to > allow > legacy fancy indexing indefinetly. This also gives a simple way to > update fancy indexing code making deprecations to vanilla indexing > easier. > 6. Vanilla indexing ``arr[...]`` could return an error for ambiguous > cases. > For the beginning, this probably means cases where ``arr[ind]`` and > ``arr.oindex[ind]`` return different results gives deprecation > warnings. > However, the exact rules for this (especially the final behaviour) > are not > quite clear in cases such as ``arr[0, :, index_arr]``. > > All other rules for indexing are identical. > > > Open Questions > ============== > > 1. Especially for the new indexing attributes ``oindex`` and ``vindex``, > a case could be made to not implicitly add an ``Ellipsis`` index if > necessary. > This helps finding bugs since a too high dimensional array can be > caught. > (I am in favor for this, but doubt we should think about this for > vanilla > indexing.) > > 2. The names ``oindex`` and ``vindex`` are just suggestions at the time > of > writing this, another name NumPy has used for something like > ``oindex`` > is ``np.ix_``. See also below. > > 3. It would be possible to limit the use of boolean indices in > ``vindex``, > assuming that they are rare and to some degree special. > (This would make implementation simpler, but I do not see a big > reason.) > > 4. ``oindex`` and ``vindex`` could always return copies, even when no > array > operation occurs. One argument for using the same rules is that this > way > ``oindex`` can be used as a general index replacement. > (There is likely no big reason for this, however, there is one > reason: > ``arr.vindex[array_scalar, ...]`` can occur, where ``arr_scalar`` > should be a 0-D array. Copying always "fixes" the possible > inconsistency.) > > 5. The final state to morph indexing in is not fixed in this PEP. It is > for > example possible that `arr[index]`` will be equivalent to > ``arr.oindex`` > at some point in the future. Since such a change will take years, it > seems unnecessary to make specific decisions now. > > 6. Proposed changes to vanilla indexing could be postponed indefinetly > or > not taken in order to not break or force fixing of existing code > bases. > > 7. Possible the ``vindex`` combination with boolean indexing could be > rethought or not allowed at all for simplicity. > > > Necessary changes to NumPy > ========================== > > Implement ``arr.oindex`` and ``arr.vindex`` objects to allow these > indexing > operations and create warnings (and eventually deprecate) ambiguous > direct > indexing operations on arrays. > > > Alternative Names > ================= > > Possible names suggested (more suggestions will be added). > > ============== ======== ======= > **Orthogonal** oindex oix > **Vectorized** vindex fix > **Legacy** l/findex > ============== ======== ======= > > > Examples > ======== > > Since the various kinds of indexing is hard to grasp in many cases, > these > examples hopefully give some more insights. Note that they are all in > terms > of shape. All original dimensions start with 5, advanced indexing > inserts less long dimensions. (Note that ``...`` or ``Ellipsis`` mostly > inserts as many slices as needed to index the full array). These > examples > may be hard to grasp without working knowledge of advanced indexing as > of > NumPy 1.9. > > Example array:: > > >>> arr = np.ones((5, 6, 7, 8)) > > > Legacy fancy indexing > --------------------- > > Single index is transposed (this is the same for all indexing types):: > > >>> arr[[0], ...].shape > (1, 6, 7, 8) > >>> arr[:, [0], ...].shape > (5, 1, 7, 8) > > > Multiple indices are transposed *if* consecutive:: > > >>> arr[:, [0], [0], :].shape # future error > (5, 1, 7) > >>> arr[:, [0], :, [0]].shape # future error > (1, 5, 6) > > > It is important to note that a scalar *is* integer array index in this > sense > (and gets broadcasted with the other advanced index):: > > >>> arr[:, [0], 0, :].shape # future error (scalar is "fancy") > (5, 1, 7) > >>> arr[:, [0], :, 0].shape # future error (scalar is "fancy") > (1, 5, 6) > > > Single boolean index can act on multiple dimensions (especially the > whole > array). It has to match (as of 1.10. a deprecation warning) the > dimensions. > The boolean index is otherwise identical to (multiple consecutive) > integer > array indices:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr[:, 0, bindx].shape > (5, 1) > >>> arr[0, :, bindx].shape > (1, 6) > > > The combination with anything that is not a scalar is confusing, e.g.:: > > >>> arr[[0], :, bindx].shape # bindx result broadcasts with [0] > (1, 6) > >>> arr[:, [0, 1], bindx] # IndexError > > > Outer indexing > -------------- > > Multiple indices are "orthogonal" and their result axes are inserted > at the same place (they are not broadcasted):: > > >>> arr.oindex[:, [0], [0, 1], :].shape > (5, 1, 2, 8) > >>> arr.oindex[:, [0], :, [0, 1]].shape > (5, 1, 7, 2) > >>> arr.oindex[:, [0], 0, :].shape > (5, 1, 8) > >>> arr.oindex[:, [0], :, 0].shape > (5, 1, 7) > > > Boolean indices results are always inserted where the index is:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr.oindex[:, 0, bindx].shape > (5, 1) > >>> arr.oindex[0, :, bindx].shape > (6, 1) > > > Nothing changed in the presence of other advanced indices since:: > > >>> arr.oindex[[0], :, bindx].shape > (1, 6, 1) > >>> arr.oindex[:, [0, 1], bindx] > (5, 2, 1) > > > Vectorized/inner indexing > ------------------------- > > Multiple indices are broadcasted and iterated as one like fancy > indexing, > but the new axes area always inserted at the front:: > > >>> arr.vindex[:, [0], [0, 1], :].shape > (2, 5, 8) > >>> arr.vindex[:, [0], :, [0, 1]].shape > (2, 5, 7) > >>> arr.vindex[:, [0], 0, :].shape > (1, 5, 8) > >>> arr.vindex[:, [0], :, 0].shape > (1, 5, 7) > > > Boolean indices results are always inserted where the index is, exactly > as in ``oindex`` given how specific they are to the axes they operate > on:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr.vindex[:, 0, bindx].shape > (5, 1) > >>> arr.vindex[0, :, bindx].shape > (6, 1) > > > But other advanced indices are again transposed to the front:: > > >>> arr.vindex[[0], :, bindx].shape > (1, 6, 1) > >>> arr.vindex[:, [0, 1], bindx] > (2, 5, 1) > > > Related Questions > ================= > > There exist a further indexing or indexing like method. That is the > inverse of a command such as ``np.argmin(arr, axis=axis)``, to pick > the specific elements *along* an axis given an array of (at least > typically) the same size. > > Doing such a thing with the indexing notation is not quite straight > forward > since the axis on which to pick elements has to be supplied. One > plausible > solution would be to create a function (calling it pick here for > simplicity):: > > np.pick(arr, index_arr, axis=axis) > > where ``index_arr`` has to be the same shape as ``arr`` except along > ``axis``. > One could imagine that this can be useful together with other indexing > types, > but such a function may be sufficient and extra information needed seems > easier > to pass using a function convention. Another option would be to allow an > argument > such as ``compress_axes=None`` (just to have some name) which maps the > axes from > the index array to the new array with ``None`` signaling a new axis. > Also keepdims could be added as a simple default. (Note that the use of > axis is not > compatible to ``np.take`` for an ``index_arr`` which is not zero or one > dimensional.) > > Another solution is to provide functions or features to the > ``arg*``functions > to map this to the equivalent ``vindex`` indexing operation. > > > Motivational Example > ==================== > > Imagine having a data acquisition software storing ``D`` channels and > ``N`` datapoints along the time. She stores this into an ``(N, D)`` > shaped > array. During data analysis, we needs to fetch a pool of channels, for > example > to calculate a mean over them. > > This data can be faked using:: > > >>> arr = np.random.random((100, 10)) > > Now one may remember indexing with an integer array and find the correct > code:: > > >>> group = arr[:, [2, 5]] > >>> mean_value = arr.mean() > > However, assume that there were some specific time points (first > dimension > of the data) that need to be specially considered. These time points are > already known and given by:: > > >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp) > > Now to fetch them, we may try to modify the previous code:: > > >>> group_at_it = arr[interesting_times, [2, 5]] > IndexError: Ambiguous index, use `.oindex` or `.vindex` > > An error such as this will point to read up the indexing documentation. > This should make it clear, that ``oindex`` behaves more like slicing. > So, out of the different methods it is the obvious choice > (for now, this is a shape mismatch, but that could possibly also mention > ``oindex``):: > > >>> group_at_it = arr.oindex[interesting_times, [2, 5]] > > Now of course one could also have used ``vindex``, but it is much less > obvious how to achieve the right thing!:: > > >>> reshaped_times = interesting_times[:, np.newaxis] > >>> group_at_it = arr.vindex[reshaped_times, [2, 5]] > > > One may find, that for example our data is corrupt in some places. > So, we need to replace these values by zero (or anything else) for these > times. The first column may for example give the necessary information, > so that changing the values becomes easy remembering boolean indexing:: > > >>> bad_data = arr[0] > 0.5 > >>> arr[bad_data, :] = 0 > > Again, however, the columns may need to be handled more individually > (but in > groups), and the ``oindex`` attribute works well:: > > >>> arr.oindex[bad_data, [2, 5]] = 0 > > Note that it would be very hard to do this using legacy fancy indexing. > The only way would be to create an integer array first:: > > >>> bad_data_indx = np.nonzero(bad_data)[0] > >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis] > >>> arr[bad_data_indx_reshaped, [2, 5]] > > In any case we can use only ``oindex`` to do all of this without getting > into any trouble or confused by the whole complexity of advanced > indexing. > > But, some new features are added to the data acquisition. Different > sensors > have to be used depending on the times. Let us assume we already have > created an array of indices:: > > >>> correct_sensors = np.random.randint(10, size=(100, 2)) > > Which lists for each time the two correct sensors in an ``(N, 2)`` > array. > > A first try to achieve this may be ``arr[:, correct_sensors]`` and this > does > not work. It should be clear quickly that slicing cannot achieve the > desired > thing. But hopefully users will remember that there is ``vindex`` as a > more > powerful and flexible approach to advanced indexing. > One may, if trying ``vindex`` randomly, be confused about:: > > >>> new_arr = arr.vindex[:, correct_sensors] > > which is neither the same, nor the correct result (see transposing > rules)! > This is because slicing works still the same in ``vindex``. However, > reading > the documentation and examples, one can hopefully quickly find the > desired > solution:: > > >>> rows = np.arange(len(arr)) > >>> rows = rows[:, np.newaxis] # make shape fit with > correct_sensors > >>> new_arr = arr.vindex[rows, correct_sensors] > > At this point we have left the straight forward world of ``oindex`` but > can > do random picking of any element from the array. Note that in the last > example > a method such as mentioned in the ``Related Questions`` section could be > more > straight forward. But this approach is even more flexible, since > ``rows`` > does not have to be a simple ``arange``, but could be > ``intersting_times``:: > > >>> correct_sensors_at_it = correct_sensors[interesting_times, :] > >>> interesting_times_reshaped = interesting_times[:, np.newaxis] > >>> new_arr_it = arr[interesting_times_reshaped, > correct_sensors_at_it] > > Truly complex situation would arise now if you would for example pool > ``L`` > experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this > should > not result into surprises. ``vindex``, being more powerful, will quite > certainly create some confusion in this case but also cover pretty much > all > eventualities. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion