Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: Is there any other package implementing non-orthogonal indexing aside from numpy? I think we can safely say that NumPy's implementation of broadcasting indexing is unique :). The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent. If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of orthogonal indexing. I just need a little more convincing that those new attributes/indexers are going to ever see any real use. Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing. Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`. It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis. So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased. Cheers, Stephan [1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Do, 2015-04-02 at 01:29 -0700, Stephan Hoyer wrote: On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: Is there any other package implementing non-orthogonal indexing aside from numpy? I think we can safely say that NumPy's implementation of broadcasting indexing is unique :). The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent. If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of orthogonal indexing. I just need a little more convincing that those new attributes/indexers are going to ever see any real use. Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing. Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`. It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis. Wrong (sorry, couldn't resist ;)), since 1.9. take is not typically faster unless you have a small subspace (subspace are the non-indexed/slice-indexed axes, though I guess small subspace is common in some cases, i.e. Nx3 array), it should typically be noticeably slower for large subspaces at the moment. Anyway, unfortunately while orthogonal indexing may seem simpler, as you probably noticed, mapping it fully featured to advanced indexing does not seem like a walk in the park due to how axis remapping works when you have a combination of slices and advanced indices. It might be possible to basically implement a second MapIterSwapaxis in addition to adding extra axes to the inputs (which I think would need a post-processing step, but that is not that bad). If you do that, you can mostly reuse the current machinery and avoid most of the really annoying code blocks which set up the iterators for the various special cases. Otherwise, for hacking it of course you can replace the slices by arrays as well ;). So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased. Cheers, Stephan [1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion signature.asc Description: This is a digitally signed message part ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On 02-Apr-15 4:35 PM, Eric Firing wrote: On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote: Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself. I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing. Eric +1 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote: Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself. I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 2:03 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote: We probably need more traction on the should this be done? discussion than on the can this be done? one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet. As a long-time user of numpy, and an advocate and teacher of Python for science, here is my perspective: Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Its use easily leads to unreadable code and hard-to-see errors. Here is the essence of an example that a student presented me with just this week, in the context of reordering eigenvectors based on argsort applied to eigenvalues: In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4)) In [26]: ii = np.arange(4) In [27]: print(xx[0]) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] In [28]: print(xx[0, :, ii]) [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]] Quickly now, how many numpy users would look at that last expression and say, Of course, that is equivalent to transposing xx[0]? And, Of course that expression should give a completely different result from xx[0][:, ii].? I would guess it would be less than 1%. That should tell you right away that we have a real problem here. Fancy indexing can't be *read* by a sub-genius--it has to be laboriously figured out piece by piece, with frequent reference to the baffling descriptions in the Numpy docs. So I think you should turn the question around and ask, What is the actual real-world use case for fancy indexing? How often does real code rely on it? I have taken advantage of it occasionally, maybe you have too, but I think a survey of existing code would show that the need for it is *far* less common than the need for simple orthogonal indexing. That tells me that it is fancy indexing, not orthogonal indexing, that should be available through a function and/or special indexing attribute. The question is then how to make that transition. Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself. np.triu_indices(5) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64)) m = np.arange(25).reshape(5, 5)[np.triu_indices(5)] m array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 18, 19, 24]) m2 = np.zeros((5,5)) m2[np.triu_indices(5)] = m m2 array([[ 0., 1., 2., 3., 4.], [ 0., 6., 7., 8., 9.], [ 0., 0., 12., 13., 14.], [ 0., 0., 0., 18., 19.], [ 0., 0., 0., 0., 24.]]) (I don't remember what's fancy in indexing, just that broadcasting rules apply.) Josef Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 6:09 PM, josef.p...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I'm sure y'all are totally on top of this, but for myself, I would like to distinguish: * fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing: In [3]: a = np.arange(24).reshape(6, 4) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]]) I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing). I think it should be DOA, except as a discussion topic for numpy 3000. I think there are two proposals here: 1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break. That seems like a very good idea to me - were there any big objections to that? 2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior. That is going to be very tough, because it will cause very confusing breakage of legacy code. On the other hand, maybe it is worth going some way towards that, like this: * implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Then if we are brave we could: * wait a very long time; * make orthogonal indexing the default. But the not-brave steps above seem less controversial, and fairly reasonable. What about that as an approach? I also thought the transition would have to be something like that or a clear break point, like numpy 3.0. I would be in favor something like this for the axis swapping case with ndim2. However, before going to that, you would still have to provide a list of behaviors that will be deprecated, and make a poll in various libraries for how much it is actually used. My impression is that fancy indexing is used more often than orthogonal indexing (beyond the trivial case x[:, idx]). Also, many usecases for orthogonal indexing moved to using pandas, and numpy is left with non-orthogonal indexing use cases. And third, fancy indexing is a superset of orthogonal indexing (with proper broadcasting), and you still need to justify why everyone should be restricted to the subset instead of a voluntary constraint to use code that is easier to understand. I checked numpy.random.choice which I would have implemented with fancy indexing, but it uses only `take`, AFAICS. Switching to using a explicit method is not really a problem for maintained library code, but I still don't really see why we should do this. Josef Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
Hi, On Thu, Apr 2, 2015 at 8:20 PM, Jaime Fernández del Río jaime.f...@gmail.com wrote: On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 6:09 PM, josef.p...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I'm sure y'all are totally on top of this, but for myself, I would like to distinguish: * fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing: In [3]: a = np.arange(24).reshape(6, 4) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]]) I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing). I think it should be DOA, except as a discussion topic for numpy 3000. I think there are two proposals here: 1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break. That seems like a very good idea to me - were there any big objections to that? 2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior. That is going to be very tough, because it will cause very confusing breakage of legacy code. On the other hand, maybe it is worth going some way towards that, like this: * implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Then if we are brave we could: * wait a very long time; * make orthogonal indexing the default. But the not-brave steps above seem less controversial, and fairly reasonable. What about that as an approach? Your option 1 was what was being discussed before the posse was assembled to bring fancy indexing before justice... ;-) Yes, sorry - I was trying to bring the argument back there. My background is in image processing, and I have used fancy indexing in all its fanciness far more often than orthogonal or outer product indexing. I actually have a vivid memory of the moment I fell in love with NumPy: after seeing a code snippet that ran a huge image through a look-up table by indexing the LUT with the image. Beautifully simple. And here is a younger me, learning to ride NumPy without the training wheels. Another obvious use case that you can find all over the place in scikit-image is drawing a curve on an image from the coordinates. No question at all that it does have its uses - but then again, no-one thinks that it should not be available, only, maybe, in the very far future, not what you get by default... Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 11:30 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Apr 2, 2015 at 6:35 PM, josef.p...@gmail.com wrote: (I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] ) Psst: np.diagonal(m2, offset=1) It was just an example (banded or toeplitz) (I know how indexing works, kind off, but don't remember what diag or other functions are exactly doing.) m2b = m2.copy() m2b[np.arange(4), np.arange(4) + 1] array([ 1., 7., 13., 19.]) m2b[np.arange(4), np.arange(4) + 1] = np.nan m2b array([[ 0., nan, 2., 3., 4.], [ 0., 6., nan, 8., 9.], [ 0., 0., 12., nan, 14.], [ 0., 0., 0., 18., nan], [ 0., 0., 0., 0., 24.]]) m2c = m2.copy() np.diagonal(m2c, offset=1) = np.nan SyntaxError: can't assign to function call dd = np.diagonal(m2c, offset=1) dd[:] = np.nan Traceback (most recent call last): File pyshell#89, line 1, in module dd[:] = np.nan ValueError: assignment destination is read-only np.__version__ '1.9.2rc1' m2d = m2.copy() m2d[np.arange(4)[::-1], np.arange(4) + 1] = np.nan Josef -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Apr 2, 2015 at 6:09 PM, josef.p...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I'm sure y'all are totally on top of this, but for myself, I would like to distinguish: * fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing: In [3]: a = np.arange(24).reshape(6, 4) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]]) I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing). I think it should be DOA, except as a discussion topic for numpy 3000. I think there are two proposals here: 1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break. That seems like a very good idea to me - were there any big objections to that? 2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior. That is going to be very tough, because it will cause very confusing breakage of legacy code. On the other hand, maybe it is worth going some way towards that, like this: * implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Then if we are brave we could: * wait a very long time; * make orthogonal indexing the default. But the not-brave steps above seem less controversial, and fairly reasonable. What about that as an approach? Your option 1 was what was being discussed before the posse was assembled to bring fancy indexing before justice... ;-) My background is in image processing, and I have used fancy indexing in all its fanciness far more often than orthogonal or outer product indexing. I actually have a vivid memory of the moment I fell in love with NumPy: after seeing a code snippet that ran a huge image through a look-up table by indexing the LUT with the image. Beautifully simple. And here http://stackoverflow.com/questions/12014186/fancier-fancy-indexing-in-numpy is a younger me, learning to ride NumPy without the training wheels. Another obvious use case that you can find all over the place in scikit-image is drawing a curve on an image from the coordinates. If there is such strong agreement on an orthogonal indexer, we might as well go ahead an implement it. But before considering any bolder steps, we should probably give it a couple of releases to see how many people out there really use it. Jaime P.S. As an aside on the remapping of axes when arrays and slices are mixed, there really is no better way. Once you realize that the array indexing a dimension does not have to be 1-D, it should clearly appear that what seems the obvious way does not generalize to the general case. E.g.: One may rightfully think that: a = np.arange(60).reshape(3, 4, 5) a[np.array([1])[:, None], ::2, [0, 1, 3]].shape (1, 3, 2) should not reorder the axes, and return an array of shape (1, 2, 3). But what do you do in the following case? idx0 = np.random.randint(3, size=(10, 1, 10)) idx2 = np.random.randint(5, size=(1, 20,
[Numpy-discussion] SciPy 2015 Conference Updates - Call for talks extended to 4/10, registration open, keynotes announced, John Hunter Plotting Contest
--- **LAST CALL FOR SCIPY 2015 TALK AND POSTER SUBMISSIONS - EXTENSION TO 4/10* --- SciPy 2015 will include 3 major topic tracks and 7 mini-symposia tracks. Submit a proposal on the SciPy 2015 website: http://scipy2015.scipy.org http://scipy2015.scipy.org. If you have any questions or comments, feel free to contact us at: mailto:scipy-organiz...@scipy.org scipy-organiz...@scipy.org http://scipy2015.scipy.org . You can also follow @scipyconf on Twitter or sign up for the mailing list on the website for the latest updates! Major topic tracks include: - Scientific Computing in Python (General track) - Python in Data Science - Quantitative Finance and Computational Social Sciences Mini-symposia will include the applications of Python in: - Astronomy and astrophysics - Computational life and medical sciences - Engineering - Geographic information systems (GIS) - Geophysics - Oceanography and meteorology - Visualization, vision and imaging -- **SCIPY 2015 REGISTRATION IS OPEN** Please register ASAP to help us get a good headcount and open the conference to as many people as we can. PLUS, everyone who registers before May 15 will not only get early bird discounts, but will also be entered in a drawing for a free registration (via refund or extra)! Register on the website at http://scipy2015.scipy.org -- **SCIPY 2015 KEYNOTE SPEAKERS ANNOUNCED** Keynote speakers were just announced and include Wes McKinney, author of Pandas; Chris Wiggins, Chief Data Scientist for The New York Times; and Jake VanderPlas, director of research at the University of Washington's eScience Institute and core contributor to a number of scientific Python libraries including sci-kit learn and AstroML. -- **ENTER THE SCIPY JOHN HUNTER EXCELLENCE IN PLOTTING CONTEST - DUE 4/13** In memory of John Hunter, creator of matplotlib, we are pleased to announce the Third Annual SciPy John Hunter Excellence in Plotting Competition. This open competition aims to highlight the importance of quality plotting to scientific progress and showcase the capabilities of the current generation of plotting software. Participants are invited to submit scientific plots to be judged by a panel. The winning entries will be announced and displayed at the conference. John Hunter's family is graciously sponsoring cash prizes up to $1,000 for the winners. We look forward to exciting submissions that push the boundaries of plotting! See details here: http://scipy2015.scipy.org/ehome/115969/276538/ Entries must be submitted by April 13, 2015 via e-mail to plotting-cont...@scipy.org -- **CALENDAR AND IMPORTANT DATES** --Sprint, Birds of a Feather, Financial Aid and Talk submissions are open NOW --Apr 10, 2015: Talk and Poster submission deadline --Apr 13, 2015: Plotting contest submissions due --Apr 15, 2015: Financial aid application deadline --Apr 17, 2015: Tutorial schedule announced --May 1, 2015: General conference speakers schedule announced --May 15, 2015 (or 150 registrants): Early-bird registration ends --Jun 1, 2015: BoF submission deadline --Jul 6-7, 2015: SciPy 2015 Tutorials --Jul 8-10, 2015: SciPy 2015 General Conference --Jul 11-12, 2015: SciPy 2015 Sprints ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 9:09 PM, josef.p...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I think it should be DOA, except as a discussion topic for numpy 3000. just my opinion is this fancy? vals array([6, 5, 4, 1, 2, 3]) a+b array([[3, 2, 1, 0], [4, 3, 2, 1], [5, 4, 3, 2]]) vals[a+b] array([[1, 4, 5, 6], [2, 1, 4, 5], [3, 2, 1, 4]]) https://github.com/scipy/scipy/blob/v0.14.0/scipy/linalg/special_matrices.py#L178 (I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] ) How would you find all the code that would not be correct anymore with a changed definition of indexing and slicing, if there is insufficient test coverage and it doesn't raise an exception? If we find it, who fixes all the legacy code? (I don't think it will be minor unless there is a new method `fix_[...]` (fancy ix) Josef Josef If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea. I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I think it should be DOA, except as a discussion topic for numpy 3000. just my opinion Josef If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea. I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
Hi, On Thu, Apr 2, 2015 at 6:09 PM, josef.p...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote: On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim 2. I'm sure y'all are totally on top of this, but for myself, I would like to distinguish: * fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing: In [3]: a = np.arange(24).reshape(6, 4) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]]) I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing). I think it should be DOA, except as a discussion topic for numpy 3000. I think there are two proposals here: 1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break. That seems like a very good idea to me - were there any big objections to that? 2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior. That is going to be very tough, because it will cause very confusing breakage of legacy code. On the other hand, maybe it is worth going some way towards that, like this: * implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Then if we are brave we could: * wait a very long time; * make orthogonal indexing the default. But the not-brave steps above seem less controversial, and fairly reasonable. What about that as an approach? Cheers, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] IDE's for numpy development?
On Thu, Apr 2, 2015 at 7:46 AM, David Cournapeau courn...@gmail.com wrote: On Wed, Apr 1, 2015 at 7:43 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 1, 2015 at 11:55 AM, Sturla Molden sturla.mol...@gmail.com wrote: Charles R Harris charlesr.har...@gmail.com wrote: I'd be interested in information from anyone with experience in using such an IDE and ideas of how Numpy might make using some of the common IDEs easier. Thoughts? I guess we could include project files for Visual Studio (and perhaps Eclipse?), like Python does. But then we would need to make sure the different build systems are kept in sync, and it will be a PITA for those who do not use Windows and Visual Studio. It is already bad enough with Distutils and Bento. I, for one, would really prefer if there only was one build process to care about. One should also note that a Visual Studio project is the only supported build process for Python on Windows. So they are not using this in addition to something else. Eclipse is better than Visual Studio for mixed Python and C development. It is also cross-platform. cmake needs to be mentioned too. It is not fully integrated with Visual Studio, but better than having multiple build processes. Mark chose cmake for DyND because it supported Visual Studio projects. OTOH, he said it was a PITA to program. I concur on that: For the 350+ packages we support at Enthought, cmake has been a higher pain point than any other build tool (that is including custom ones). And we only support mainstream platforms. But the real question for me is what does visual studio support mean ? Does it really mean solution files ? I have no useful experience with Visual Studio, so don't really know, but solution files sounds like a step in the right direction. What do solution files provide? Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On 03 Apr 2015, at 00:04, Colin J. Williams c...@ncf.ca wrote: On 02-Apr-15 4:35 PM, Eric Firing wrote: On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote: Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself. I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing. Eric +1 Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea. Hanno ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On 2015/04/02 1:14 PM, Hanno Klemm wrote: Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea. I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 6:35 PM, josef.p...@gmail.com wrote: (I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] ) Psst: np.diagonal(m2, offset=1) -- Nathaniel J. Smith -- http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
The distinction that boolean indexing has over the other 2 methods of indexing is that it can guarantee that it references a position at most once. Slicing and scalar indexes are also this way, hence why these methods allow for in-place assignments. I don't see boolean indexing as an extension of orthogonal indexing because of that. Ben Root On Thu, Apr 2, 2015 at 2:41 PM, Stephan Hoyer sho...@gmail.com wrote: On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efir...@hawaii.edu wrote: Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Well put! I also failed to correct predict your example. So I think you should turn the question around and ask, What is the actual real-world use case for fancy indexing? How often does real code rely on it? I'll just note that Indexing with a boolean array with the same shape as the array (e.g., x[x 0] when x has greater than 1 dimension) technically falls outside a strict interpretation of orthogonal indexing. But there's not any ambiguity in adding that as an extension to orthogonal indexing (which otherwise does not allow ndim 1), so I think your point still stands. Stephan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] IDE's for numpy development?
On Wed, Apr 1, 2015 at 7:43 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Apr 1, 2015 at 11:55 AM, Sturla Molden sturla.mol...@gmail.com wrote: Charles R Harris charlesr.har...@gmail.com wrote: I'd be interested in information from anyone with experience in using such an IDE and ideas of how Numpy might make using some of the common IDEs easier. Thoughts? I guess we could include project files for Visual Studio (and perhaps Eclipse?), like Python does. But then we would need to make sure the different build systems are kept in sync, and it will be a PITA for those who do not use Windows and Visual Studio. It is already bad enough with Distutils and Bento. I, for one, would really prefer if there only was one build process to care about. One should also note that a Visual Studio project is the only supported build process for Python on Windows. So they are not using this in addition to something else. Eclipse is better than Visual Studio for mixed Python and C development. It is also cross-platform. cmake needs to be mentioned too. It is not fully integrated with Visual Studio, but better than having multiple build processes. Mark chose cmake for DyND because it supported Visual Studio projects. OTOH, he said it was a PITA to program. I concur on that: For the 350+ packages we support at Enthought, cmake has been a higher pain point than any other build tool (that is including custom ones). And we only support mainstream platforms. But the real question for me is what does visual studio support mean ? Does it really mean solution files ? David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer sho...@gmail.com wrote: On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río jaime.f...@gmail.com wrote: Is there any other package implementing non-orthogonal indexing aside from numpy? I think we can safely say that NumPy's implementation of broadcasting indexing is unique :). The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent. If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of orthogonal indexing. I just need a little more convincing that those new attributes/indexers are going to ever see any real use. Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing. Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`. It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis. So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased. Cheers, Stephan [1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 I believe you can leave all slices unchanged if you later reshuffle your axes. Basically all the fancy-indexed axes go in the front of the shape in order, and the subspace follows, e.g.: a = np.arange(60).reshape(3, 4, 5) a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape (1, 3, 2) So you would need to swap the second and last axes and be done. You would not get a contiguous array without a copy, but that's a different story. Assigning to an orthogonally indexed subarray is an entirely different beast, not sure if there is a use case for that. We probably need more traction on the should this be done? discussion than on the can this be done? one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet. Nathaniel usually has good insights on who we are, where do we come from, where are we going to, type of questions, would be good to have him chime in. Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efir...@hawaii.edu wrote: Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Well put! I also failed to correct predict your example. So I think you should turn the question around and ask, What is the actual real-world use case for fancy indexing? How often does real code rely on it? I'll just note that Indexing with a boolean array with the same shape as the array (e.g., x[x 0] when x has greater than 1 dimension) technically falls outside a strict interpretation of orthogonal indexing. But there's not any ambiguity in adding that as an extension to orthogonal indexing (which otherwise does not allow ndim 1), so I think your point still stands. Stephan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal
On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote: We probably need more traction on the should this be done? discussion than on the can this be done? one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet. As a long-time user of numpy, and an advocate and teacher of Python for science, here is my perspective: Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Its use easily leads to unreadable code and hard-to-see errors. Here is the essence of an example that a student presented me with just this week, in the context of reordering eigenvectors based on argsort applied to eigenvalues: In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4)) In [26]: ii = np.arange(4) In [27]: print(xx[0]) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] In [28]: print(xx[0, :, ii]) [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]] Quickly now, how many numpy users would look at that last expression and say, Of course, that is equivalent to transposing xx[0]? And, Of course that expression should give a completely different result from xx[0][:, ii].? I would guess it would be less than 1%. That should tell you right away that we have a real problem here. Fancy indexing can't be *read* by a sub-genius--it has to be laboriously figured out piece by piece, with frequent reference to the baffling descriptions in the Numpy docs. So I think you should turn the question around and ask, What is the actual real-world use case for fancy indexing? How often does real code rely on it? I have taken advantage of it occasionally, maybe you have too, but I think a survey of existing code would show that the need for it is *far* less common than the need for simple orthogonal indexing. That tells me that it is fancy indexing, not orthogonal indexing, that should be available through a function and/or special indexing attribute. The question is then how to make that transition. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion