On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett <matthew.br...@gmail.com> wrote: > Hi, > > On Thu, Apr 2, 2015 at 6:09 PM, <josef.p...@gmail.com> wrote: >> On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing <efir...@hawaii.edu> wrote: >>> On 2015/04/02 1:14 PM, Hanno Klemm wrote: >>>> Well, I have written quite a bit of code that relies on fancy >>>> indexing, and I think the question, if the behaviour of the [] >>>> operator should be changed has sailed with numpy now at version 1.9. >>>> Given the amount packages that rely on numpy, changing this >>>> fundamental behaviour would not be a clever move. >>> >>> Are you *positive* that there is no clever way to make a transition? >>> It's not worth any further thought? >> >> I guess it would be similar to python 3 string versus bytes, but >> without the overwhelming benefits. >> >> I don't think I would be in favor of deprecating fancy indexing even >> if it were possible. In general, my impression is that if there is a >> trade-off in numpy between powerful machinery versus easy to learn and >> teach, then the design philosophy when in favor of power. >> >> I think numpy indexing is not too difficult and follows a consistent >> pattern, and I completely avoid mixing slices and index arrays with >> ndim > 2. > > I'm sure y'all are totally on top of this, but for myself, I would > like to distinguish: > > * fancy indexing with boolean arrays - I use it all the time and don't > get confused; > * fancy indexing with non-boolean arrays - horrendously confusing, > almost never use it, except on a single axis when I can't confuse it > with orthogonal indexing: > > In [3]: a = np.arange(24).reshape(6, 4) > > In [4]: a > Out[4]: > array([[ 0, 1, 2, 3], > [ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [12, 13, 14, 15], > [16, 17, 18, 19], > [20, 21, 22, 23]]) > > In [5]: a[[1, 2, 4]] > Out[5]: > array([[ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [16, 17, 18, 19]]) > > I also remember a discussion with Travis O where he was also saying > that this indexing was confusing and that it would be good if there > was some way to transition to what he called outer product indexing (I > think that's the same as 'orthogonal' indexing). > >> I think it should be DOA, except as a discussion topic for numpy 3000. > > I think there are two proposals here: > > 1) Add some syntactic sugar to allow orthogonal indexing of numpy > arrays, no backward compatibility break. > > That seems like a very good idea to me - were there any big objections to > that? > > 2) Over some long time period, move the default behavior of np.array > non-boolean indexing from the current behavior to the orthogonal > behavior. > > That is going to be very tough, because it will cause very confusing > breakage of legacy code. > > On the other hand, maybe it is worth going some way towards that, like this: > > * implement orthogonal indexing as a method arr.sensible_index[...] > * implement the current non-boolean fancy indexing behavior as a > method - arr.crazy_index[...] > * deprecate non-boolean fancy indexing as standard arr[...] indexing; > * wait a long time; > * remove non-boolean fancy indexing as standard arr[...] (errors are > preferable to change in behavior) > > Then if we are brave we could: > > * wait a very long time; > * make orthogonal indexing the default. > > But the not-brave steps above seem less controversial, and fairly reasonable. > > What about that as an approach?
I also thought the transition would have to be something like that or a clear break point, like numpy 3.0. I would be in favor something like this for the axis swapping case with ndim>2. However, before going to that, you would still have to provide a list of behaviors that will be deprecated, and make a poll in various libraries for how much it is actually used. My impression is that fancy indexing is used more often than orthogonal indexing (beyond the trivial case x[:, idx]). Also, many usecases for orthogonal indexing moved to using pandas, and numpy is left with non-orthogonal indexing use cases. And third, fancy indexing is a superset of orthogonal indexing (with proper broadcasting), and you still need to justify why everyone should be restricted to the subset instead of a voluntary constraint to use code that is easier to understand. I checked numpy.random.choice which I would have implemented with fancy indexing, but it uses only `take`, AFAICS. Switching to using a explicit method is not really a problem for maintained library code, but I still don't really see why we should do this. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion