Re: [Numpy-discussion] Creating parallel curves
You can get polygon buffer from http://angusj.com/delphi/clipper.php and make cython interface to it. HTH Niki ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Initializing an array to a constant value
I have a pretty silly question about initializing an array a to a given scalar value, say A. Most of the time I use a=np.ones(shape)*A which seems the most widespread idiom, but I got recently interested in getting some performance improvement. I tried a=np.zeros(shape)+A, based on broadcasting but it seems to be equivalent in terms of speed. Now, the fastest : a = np.empty(shape) a.fill(A) but it is a two-steps instruction to do one thing, which I feel doesn't look very nice. Did I miss an all-in-one function like numpy.fill(shape, A) ? Best, Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Indexing 2d arrays by column using an integer array
Hi, Apologies if the following is a trivial question. I wish to index the columns of the following 2D array In [78]: neighbourhoods Out[78]: array([[8, 0, 1], [0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 0]]) using the integer array In [76]: perf[neighbourhoods].argmax(axis=1) Out[76]: array([2, 1, 0, 2, 1, 0, 0, 2, 1]) to produce a 9-element array but can't find a way of applying the indices to the columns rather than the rows. Is this do-able without using loops? The looped version of what I want is np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] for i in xrange(neighbourhoods.shape[0])] ) Regards, -- Will Furnass Doctoral Student Pennine Water Group Department of Civil and Structural Engineering University of Sheffield Phone: +44 (0)114 22 25768 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Indexing 2d arrays by column using an integer array
I think the following is what you want: neighborhoods[range(9),perf[neighbourhoods].argmax(axis=1)] -Travis On Feb 13, 2012, at 1:26 PM, William Furnass wrote: np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] for i in xrange(neighbourhoods.shape[0])] ) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Creating parallel curves
On Mon, Feb 13, 2012 at 1:01 AM, Niki Spahiev niki.spah...@gmail.com wrote: You can get polygon buffer from http://angusj.com/delphi/clipper.php and make cython interface to it. This should be built into GEOS as well, and the shapely package provides a python wrapper already. -Chris HTH Niki ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Indexing 2d arrays by column using an integer array
Thank you, that does the trick. Regards, Will On 13 February 2012 19:39, Travis Oliphant tra...@continuum.io wrote: I think the following is what you want: neighborhoods[range(9),perf[neighbourhoods].argmax(axis=1)] -Travis On Feb 13, 2012, at 1:26 PM, William Furnass wrote: np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] for i in xrange(neighbourhoods.shape[0])] ) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant tra...@continuum.iowrote: I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments.Both of these plans allow Open Source projects to have unlimited plans for free. Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. YouTrack from JetBrains: http://www.jetbrains.com/youtrack/features/issue_tracking.html This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. JIRA: http://www.atlassian.com/software/jira/overview/tour/code-integration Haven't looked into this one in much detail. I happen to have a dislike for Confluence (their wiki system), so someone else can say some nice things about JIRA. Haven't tried either tracker though. Anyone with actual experience? What Mark Wiebe said about making it easy to manage the issues quickly and what Eric said about making sure there are interfaces with dense information content really struck chords with me. I have seen a lot of time wasted on issue management with Trac --- time that could be better spent on NumPy.I'd like to make issue management efficient --- even if it means a system separate from GitHub. Issue management is a very important part of the open-source process. While we're at it, our buildbot situation is much worse than our issue tracker situation. This also looks good (and free): http://www.jetbrains.com/teamcity/ Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant tra...@continuum.io wrote: I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments.Both of these plans allow Open Source projects to have unlimited plans for free. Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. YouTrack from JetBrains: http://www.jetbrains.com/youtrack/features/issue_tracking.html This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. I do like the team behind JetBrains. And I've seen and heard good things about TeamCity. Thanks for reminding me about the build-bot situation. That is one thing I would like to address sooner rather than later as well. Thanks, -Travis JIRA: http://www.atlassian.com/software/jira/overview/tour/code-integration Haven't looked into this one in much detail. I happen to have a dislike for Confluence (their wiki system), so someone else can say some nice things about JIRA. Haven't tried either tracker though. Anyone with actual experience? What Mark Wiebe said about making it easy to manage the issues quickly and what Eric said about making sure there are interfaces with dense information content really struck chords with me. I have seen a lot of time wasted on issue management with Trac --- time that could be better spent on NumPy. I'd like to make issue management efficient --- even if it means a system separate from GitHub. Issue management is a very important part of the open-source process. While we're at it, our buildbot situation is much worse than our issue tracker situation. This also looks good (and free): http://www.jetbrains.com/teamcity/ Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
Hi, On Mon, Feb 13, 2012 at 12:44 PM, Travis Oliphant tra...@continuum.io wrote: On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant tra...@continuum.io wrote: I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments. Both of these plans allow Open Source projects to have unlimited plans for free. Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. YouTrack from JetBrains: http://www.jetbrains.com/youtrack/features/issue_tracking.html This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. I do like the team behind JetBrains. And I've seen and heard good things about TeamCity. Thanks for reminding me about the build-bot situation. That is one thing I would like to address sooner rather than later as well. We've (nipy) got a buildbot collection working OK. If you want to go that way you are welcome to use our machines. It's a somewhat flaky setup though. http://nipy.bic.berkeley.edu/builders I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. Ondrej did some nice stuff on integrating a build with the github pull requests: https://github.com/sympy/sympy-bot Some discussion of buildbot and Jenkins: http://vperic.blogspot.com/2011/05/continuous-integration-and-sympy.html See you, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Initializing an array to a constant value
Le 13/02/2012 19:17, eat a écrit : wouldn't it be nice if you could just write: a= np.empty(shape).fill(A) this would be possible if .fill(.) just returned self. Thanks for the tip. I noticed several times this was not working (because of course, in the mean time, I forgot it...) but I had totally overlooked the reasons (just imagining there was some garbage collection magic vanishing my arrays !!) I find the syntax np.empty(shape).fill(A) being indeed a good alternative to the burden of creating a new numpy.fill (or numpy.filled ?) function. -- Pierre ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Fwd: Re: Creating parallel curves
-- Forwarded message -- From: Andrea Gavana andrea.gav...@gmail.com Date: Feb 13, 2012 11:31 PM Subject: Re: [Numpy-discussion] Creating parallel curves To: Jonathan Hilmer jkhil...@gmail.com Thank you Jonathan for this, it's exactly what I was looking for. I' ll try it tomorrow on the 768 well trajectories I have and I'll let you know if I stumble upon any issue. If someone could shed some light on my problem number 2 (how to adjust the scaling/distance) so that the curves look parallel on a matplotlib graph even though the axes scales are different, I'd be more than grateful. Thank you in advance. Andrea. On Feb 13, 2012 4:32 AM, Jonathan Hilmer jkhil...@gmail.com wrote: Andrea, This is playing some tricks with 2D array expansion to make a tradeoff in memory for speed. Given two sets of matching vectors (one reference, given first, and a newly-expanded one, given second), it removes all points from the expanded vectors that aren't needed to describe the new contour. def filter_expansion(x, y, x_expan, y_expan, distance_target, tol=1e-6): target_xx, expansion_xx = scipy.meshgrid(x, x_expan) target_yy, expansion_yy = scipy.meshgrid(y, y_expan) distance = scipy.sqrt((expansion_yy - target_yy)**2 + (expansion_xx - target_xx)**2) valid = distance.min(axis=1) distance_target*(1.-tol) return x_expan.compress(valid), y_expan.compress(valid) # Jonathan On Sun, Feb 12, 2012 at 2:31 PM, Robert Kern robert.k...@gmail.com wrote: On Sun, Feb 12, 2012 at 20:26, Andrea Gavana andrea.gav...@gmail.com wrote: I know, my definition of parallel was probably not orthodox enough. What I am looking for is to generate 2 curves that look graphically parallel enough to the original one, and not parallel in the true mathematical sense. There is a rigorous way to define the curve that you are looking for, and fortunately it gives some hints for implementation. For each point (x,y) in space, associate with it the nearest distance D from that point to the reference curve. The parallel curves are just two sides of the level set where D(x,y) is equal to the specified distance (possibly removing the circular caps that surround the ends of the reference curve). If performance is not a constraint, then you could just evaluate that D(x,y) function on a fine-enough grid and do marching squares to find the level set. matplotlib's contour plotting routines can help here. There is a hint in the PyX page that you linked to that you should consider. Angles in the reference curve become circular arcs in the parallel curves. So if your reference curve is just a bunch of line segments, then what you can do is take each line segment, and make parallel copies the same length to either side. Now you just need to connect up these parallel segments with each other. You do this by using circular arcs centered on the vertices of the reference curve. Do this on both sides. On the outer side, the arcs will go forward while on the inner side, the arcs will go backwards just like the cusps that you saw in your attempt. Now let's take care of that. You will have two self-intersecting curves consisting of alternating line segments and circular arcs. Parts of these curves will be too close to the reference curve. You will have to go through these curves to find the locations of self-intersection and remove the parts of the segments and arcs that are too close to the reference curve. This is tricky to do, but the formulae for segment-segment, segment-arc, and arc-arc intersection can be found online. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
On 2/13/12 2:56 PM, Matthew Brett wrote: I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. I'm not aware of a Jenkins buildbot system for Sage, though I think Cython uses such a system: https://sage.math.washington.edu:8091/hudson/ We do have a number of systems we build and test Sage on, though I don't think we have continuous integration yet. I've CCd Jeroen Demeyer, who is the current release manager for Sage. Jeroen, do we have an automatic buildbot system for Sage? Thanks, Jason -- Jason Grout ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
Hi, On Mon, Feb 13, 2012 at 2:33 PM, jason-s...@creativetrax.com wrote: On 2/13/12 2:56 PM, Matthew Brett wrote: I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. I'm not aware of a Jenkins buildbot system for Sage, though I think Cython uses such a system: https://sage.math.washington.edu:8091/hudson/ We do have a number of systems we build and test Sage on, though I don't think we have continuous integration yet. I've CCd Jeroen Demeyer, who is the current release manager for Sage. Jeroen, do we have an automatic buildbot system for Sage? Ah - sorry - I was thinking of the Cython system on the SAGE server. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
On Mon, Feb 13, 2012 at 12:56 PM, Matthew Brett matthew.br...@gmail.com wrote: I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. So are we in IPython, thanks to Thomas Kluyver's recent leadership on this front it's now running quite smoothly: https://jenkins.shiningpanda.com/ipython/ I'm pretty sure Thomas is on this list, if you folks have any questions on the details of the setup. Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Index Array Performance
Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Enthought-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
On Feb 13, 2012, at 3:55 PM, Fernando Perez wrote: ... - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. ... The link to the document isn't quite right. Please update it -- I can't wait for some nostalgic reading ;-) Travis___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [Enthought-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
On Mon, Feb 13, 2012 at 3:46 PM, Travis Vaught tra...@vaught.net wrote: - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. ... The link to the document isn't quite right. Please update it -- I can't wait for some nostalgic reading ;-) Oops, sorry; I pasted the local build url by accident: http://fperez.org/py4science/numpy-pep225/numpy-pep225.html And BTW, this discussion will take place on Friday March 2nd, most likely 3-5pm. We'll add that info to the pydata page as soon as it's finalized. Cheers, f ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Index Array Performance
On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver m.oli...@jacobs-university.de wrote: Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 11 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 1 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 10 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: import numpy as np Adata = np.array([127], dtype=np.int8) Bdata = np.int16(127) (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy = 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Index Array Performance
How would you fix it? I shouldn't speculate without profiling, but I'll be naughty. Presumably the problem is that python turns that into something like hist[i,j] = hist[i,j] + 1 which means there's no way for numpy to avoid creating a temporary array. So maybe this could be fixed by adding a fused __inplace_add__ protocol to the language (and similarly for all the other inplace operators), but that seems really unlikely. Fundamentally this is just the sort of optimization opportunity you miss when you don't have a compiler with a global view; Fortran or c++ expression templates will win every time. Maybe pypy will fix it someday. Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? - N On Feb 14, 2012 12:18 AM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver m.oli...@jacobs-university.de wrote: Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 11 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 1 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 10 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Index Array Performance
On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith n...@pobox.com wrote: How would you fix it? I shouldn't speculate without profiling, but I'll be naughty. Presumably the problem is that python turns that into something like hist[i,j] = hist[i,j] + 1 which means there's no way for numpy to avoid creating a temporary array. So maybe this could be fixed by adding a fused __inplace_add__ protocol to the language (and similarly for all the other inplace operators), but that seems really unlikely. Fundamentally this is just the sort of optimization opportunity you miss when you don't have a compiler with a global view; Fortran or c++ expression templates will win every time. Maybe pypy will fix it someday. Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? - N Nope, don't buy it: In [33]: timeit arr.__iadd__(1) 1000 loops, best of 3: 1.13 ms per loop In [37]: timeit arr[:] += 1 1000 loops, best of 3: 1.13 ms per loop - Wes On Feb 14, 2012 12:18 AM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver m.oli...@jacobs-university.de wrote: Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 11 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 1 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 10 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Index Array Performance
On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith n...@pobox.com wrote: How would you fix it? I shouldn't speculate without profiling, but I'll be naughty. Presumably the problem is that python turns that into something like hist[i,j] = hist[i,j] + 1 which means there's no way for numpy to avoid creating a temporary array. So maybe this could be fixed by adding a fused __inplace_add__ protocol to the language (and similarly for all the other inplace operators), but that seems really unlikely. Fundamentally this is just the sort of optimization opportunity you miss when you don't have a compiler with a global view; Fortran or c++ expression templates will win every time. Maybe pypy will fix it someday. Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? - N Nope, don't buy it: In [33]: timeit arr.__iadd__(1) 1000 loops, best of 3: 1.13 ms per loop In [37]: timeit arr[:] += 1 1000 loops, best of 3: 1.13 ms per loop - Wes Actually, apologies, I'm being silly (had too much coffee or something). Python may be doing something nefarious with the hist[i,j] += 1. So both a get, add, then set, which is probably the problem. On Feb 14, 2012 12:18 AM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver m.oli...@jacobs-university.de wrote: Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 11 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 1 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 10 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Index Array Performance
On Mon, Feb 13, 2012 at 7:48 PM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith n...@pobox.com wrote: How would you fix it? I shouldn't speculate without profiling, but I'll be naughty. Presumably the problem is that python turns that into something like hist[i,j] = hist[i,j] + 1 which means there's no way for numpy to avoid creating a temporary array. So maybe this could be fixed by adding a fused __inplace_add__ protocol to the language (and similarly for all the other inplace operators), but that seems really unlikely. Fundamentally this is just the sort of optimization opportunity you miss when you don't have a compiler with a global view; Fortran or c++ expression templates will win every time. Maybe pypy will fix it someday. Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? - N Nope, don't buy it: In [33]: timeit arr.__iadd__(1) 1000 loops, best of 3: 1.13 ms per loop In [37]: timeit arr[:] += 1 1000 loops, best of 3: 1.13 ms per loop - Wes Actually, apologies, I'm being silly (had too much coffee or something). Python may be doing something nefarious with the hist[i,j] += 1. So both a get, add, then set, which is probably the problem. On Feb 14, 2012 12:18 AM, Wes McKinney wesmck...@gmail.com wrote: On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver m.oli...@jacobs-university.de wrote: Hi, I have a short piece of code where the use of an index array feels right, but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of on the fly histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel = #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 5 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() In [4]: timeit y=f(x) 1 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 1 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 1 loops, best of 3: 119 us per loop ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 11 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 1 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 10 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion But: In [40]: timeit hist[i, j] 1 loops, best of 3: 32 us per loop So that's roughly 7-8x slower than a simple Cython method, so I sincerely hope it could be brought down to the sub 10 microsecond level with a little bit of work.
Re: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
I'd like the ability to make in (i.e., __contains__) return something other than a bool. Also, the ability to make the x y z syntax would be useful. It's been suggested that the ability to override the boolean operators (and, or, not) would be the way to do this (pep 335), though I'm not 100% convinced that's the way to go. Aaron Meurer On Mon, Feb 13, 2012 at 2:55 PM, Fernando Perez fperez@gmail.com wrote: Hi folks, [ I'm broadcasting this widely for maximum reach, but I'd appreciate it if replies can be kept to the *numpy* list, which is sort of the 'base' list for scientific/numerical work. It will make it much easier to organize a coherent set of notes later on. Apology if you're subscribed to all and get it 10 times. ] As part of the PyData workshop (http://pydataworkshop.eventbrite.com) to be held March 2 and 3 at the Mountain View Google offices, we have scheduled a session for an open discussion with Guido van Rossum and hopefully as many core python-dev members who can make it. We wanted to seize the combined opportunity of the PyData workshop bringing a number of 'scipy people' to Google with the timeline for Python 3.3, the first release after the Python language moratorium, being within sight: http://www.python.org/dev/peps/pep-0398. While a number of scientific Python packages are already available for Python 3 (either in released form or in their master git branches), it's fair to say that there hasn't been a major transition of the scientific community to Python3. Since there is no more development being done on the Python2 series, eventually we will all want to find ways to make this transition, and we think that this is an excellent time to engage the core python development team and consider ideas that would make Python3 generally a more appealing language for scientific work. Guido has made it clear that he doesn't speak for the day-to-day development of Python anymore, so we all should be aware that any ideas that come out of this panel will still need to be discussed with python-dev itself via standard mechanisms before anything is implemented. Nonetheless, the opportunity for a solid face-to-face dialog for brainstorming was too good to pass up. The purpose of this email is then to solicit, from all of our community, ideas for this discussion. In a week or so we'll need to summarize the main points brought up here and make a more concrete agenda out of it; I will also post a summary of the meeting afterwards here. Anything is a valid topic, some points just to get the conversation started: - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. - Improved syntax/support for rationals or decimal literals? While Python now has both decimals (http://docs.python.org/library/decimal.html) and rationals (http://docs.python.org/library/fractions.html), they're quite clunky to use because they require full constructor calls. Guido has mentioned in previous discussions toying with ideas about support for different kinds of numeric literals... - Using the numpy docstring standard python-wide, and thus having python improve the pathetic state of the stdlib's docstrings? This is an area where our community is light years ahead of the standard library, but we'd all benefit from Python itself improving on this front. I'm toying with the idea of giving a lighting talk at PyConn about this, comparing the great, robust culture and tools of good docstrings across the Scipy ecosystem with the sad, sad state of docstrings in the stdlib. It might spur some movement on that front from the stdlib authors, esp. if the core python-dev team realizes the value and benefit it can bring (at relatively low cost, given how most of the information does exist, it's just in the wrong places). But more importantly for us, if there was truly a universal standard for high-quality docstrings across Python projects, building good documentation/help machinery would be a lot easier, as we'd know what to expect and search for (such as rendering them nicely in the ipython notebook, providing high-quality cross-project help search, etc). - Literal syntax for arrays? Sage has been floating a discussion about a literal matrix syntax (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). For something like this to go into python in any meaningful way there would have to be core multidimensional arrays in the language, but perhaps it's time to think about a piece of the numpy array itself into Python? This is one of the more 'out there' ideas, but after all, that's the point of a discussion like this, especially considering we'll have both
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 6:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: import numpy as np Adata = np.array([127], dtype=np.int8) Bdata = np.int16(127) (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy = 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] can_cast with structured array output - bug?
Hi, I've also just noticed this oddity: In [17]: np.can_cast('c', 'u1') Out[17]: False OK so far, but... In [18]: np.can_cast('c', [('f1', 'u1')]) Out[18]: True In [19]: np.can_cast('c', [('f1', 'u1')], 'safe') Out[19]: True In [20]: np.can_cast(np.ones(10, dtype='c'), [('f1', 'u1')]) Out[20]: True I think this must be a bug. In the other direction, it makes more sense to me: In [24]: np.can_cast([('f1', 'u1')], 'c') Out[24]: False In [25]: np.can_cast([('f1', 'u1')], [('f1', 'u1')]) Out[25]: True Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant tra...@continuum.iowrote: Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example a+b could result in a different type than b+a. I recall there being some bugs in the tracker related to this as well, but I don't remember those details. This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned - signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting The ufunc uses a more consistent algorithm for loop selection.: http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. Cheers, Mark -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 6:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: import numpy as np Adata = np.array([127], dtype=np.int8) Bdata = np.int16(127) (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy = 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issue Tracking
I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments.Both of these plans allow Open Source projects to have unlimited plans for free. JIRA: http://www.atlassian.com/software/jira/overview/tour/code-integration At work we just transitioned off JIRA to TFS. Have to say, for bug tracking, JIRA was a lot better than TFS, not too good as a planning tool though. It is quite customizable and flexible. Nice ability to set up automatic e-mails and such as well. -- --- | Alan K. Jackson| To see a World in a Grain of Sand | | a...@ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | --- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
It might be nice to turn the matrix class into a short class hierarchy, something like this: class MatrixBase class DenseMatrix(MatrixBase) class TriangularMatrix(MatrixBase) # Maybe a few variations of upper/lower triangular and whether the diagonal is stored class SymmetricMatrix(MatrixBase) These other matrix classes could use packed storage, and could call the specific optimized BLAS/LAPACK functions to get higher performance when it is known the matrix is triangular or symmetric. I'm not sure whether this affects the discussion of the matrix * and \ operators, but it's a possibility to consider. -Mark On Mon, Feb 13, 2012 at 4:53 PM, Aaron Meurer asmeu...@gmail.com wrote: I'd like the ability to make in (i.e., __contains__) return something other than a bool. Also, the ability to make the x y z syntax would be useful. It's been suggested that the ability to override the boolean operators (and, or, not) would be the way to do this (pep 335), though I'm not 100% convinced that's the way to go. Aaron Meurer On Mon, Feb 13, 2012 at 2:55 PM, Fernando Perez fperez@gmail.com wrote: Hi folks, [ I'm broadcasting this widely for maximum reach, but I'd appreciate it if replies can be kept to the *numpy* list, which is sort of the 'base' list for scientific/numerical work. It will make it much easier to organize a coherent set of notes later on. Apology if you're subscribed to all and get it 10 times. ] As part of the PyData workshop (http://pydataworkshop.eventbrite.com) to be held March 2 and 3 at the Mountain View Google offices, we have scheduled a session for an open discussion with Guido van Rossum and hopefully as many core python-dev members who can make it. We wanted to seize the combined opportunity of the PyData workshop bringing a number of 'scipy people' to Google with the timeline for Python 3.3, the first release after the Python language moratorium, being within sight: http://www.python.org/dev/peps/pep-0398. While a number of scientific Python packages are already available for Python 3 (either in released form or in their master git branches), it's fair to say that there hasn't been a major transition of the scientific community to Python3. Since there is no more development being done on the Python2 series, eventually we will all want to find ways to make this transition, and we think that this is an excellent time to engage the core python development team and consider ideas that would make Python3 generally a more appealing language for scientific work. Guido has made it clear that he doesn't speak for the day-to-day development of Python anymore, so we all should be aware that any ideas that come out of this panel will still need to be discussed with python-dev itself via standard mechanisms before anything is implemented. Nonetheless, the opportunity for a solid face-to-face dialog for brainstorming was too good to pass up. The purpose of this email is then to solicit, from all of our community, ideas for this discussion. In a week or so we'll need to summarize the main points brought up here and make a more concrete agenda out of it; I will also post a summary of the meeting afterwards here. Anything is a valid topic, some points just to get the conversation started: - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. - Improved syntax/support for rationals or decimal literals? While Python now has both decimals (http://docs.python.org/library/decimal.html) and rationals (http://docs.python.org/library/fractions.html), they're quite clunky to use because they require full constructor calls. Guido has mentioned in previous discussions toying with ideas about support for different kinds of numeric literals... - Using the numpy docstring standard python-wide, and thus having python improve the pathetic state of the stdlib's docstrings? This is an area where our community is light years ahead of the standard library, but we'd all benefit from Python itself improving on this front. I'm toying with the idea of giving a lighting talk at PyConn about this, comparing the great, robust culture and tools of good docstrings across the Scipy ecosystem with the sad, sad state of docstrings in the stdlib. It might spur some movement on that front from the stdlib authors, esp. if the core python-dev team realizes the value and benefit it can bring (at relatively low cost, given how most of the information does exist, it's just in the wrong places). But more importantly for us, if there was truly a universal standard for high-quality docstrings across Python
Re: [Numpy-discussion] can_cast with structured array output - bug?
I took a look into the code to see what is causing this, and the reason is that nothing has ever been implemented to deal with the fields. This means it falls back to treating all struct dtypes as if they were a plain void dtype, which allows anything to be cast to it. While I was redoing the casting subsystem for 1.6, I did think on this issue, and decided that it wasn't worth tackling it at the time because the 'safe'/'same_kind'/'unsafe' don't seem sufficient to handle what might be desired. I tried to leave this alone as much as possible. Some random thoughts about this are: * Casting a scalar to a struct dtype: should it be safe if the scalar can be safely cast to each member of the struct dtype? This is the NumPy broadcasting rule applied to dtypes as if the struct dtype is another dimension. * Casting one struct dtype to another: If the fields of the source are a subset of the target, and the types can safely convert, should that be a safe cast? If the fields of the source are not a subset of the target, should that still be a same_kind cast? Should a second enum which complements the safe/same_kind/unsafe one, but is specific for how adding/removing struct fields be added? This is closely related to adding ufunc support for struct dtypes, and the choices here should probably be decided at the same time as designing how the ufuncs should work. -Mark On Mon, Feb 13, 2012 at 5:20 PM, Matthew Brett matthew.br...@gmail.comwrote: Hi, I've also just noticed this oddity: In [17]: np.can_cast('c', 'u1') Out[17]: False OK so far, but... In [18]: np.can_cast('c', [('f1', 'u1')]) Out[18]: True In [19]: np.can_cast('c', [('f1', 'u1')], 'safe') Out[19]: True In [20]: np.can_cast(np.ones(10, dtype='c'), [('f1', 'u1')]) Out[20]: True I think this must be a bug. In the other direction, it makes more sense to me: In [24]: np.can_cast([('f1', 'u1')], 'c') Out[24]: False In [25]: np.can_cast([('f1', 'u1')], [('f1', 'u1')]) Out[25]: True Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought that int16 op scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. We will need to look in detail at what has changed. I will write a test to do that. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 7:58 PM, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant tra...@continuum.io wrote: Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example a+b could result in a different type than b+a. I recall there being some bugs in the tracker related to this as well, but I don't remember those details. This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned - signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting The ufunc uses a more consistent algorithm for loop selection.: http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. Cheers, Mark -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 6:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: import numpy as np Adata = np.array([127], dtype=np.int8) Bdata = np.int16(127) (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy = 1.6.0 we have this: Bdata = np.int16(128) (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. Thanks, -Mark On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant tra...@continuum.iowrote: The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought that int16 op scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. We will need to look in detail at what has changed. I will write a test to do that. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 7:58 PM, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant tra...@continuum.iowrote: Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example a+b could result in a different type than b+a. I recall there being some bugs in the tracker related to this as well, but I don't remember those details. This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned - signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting The ufunc uses a more consistent algorithm for loop selection.: http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. Cheers, Mark -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 6:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy
Re: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
On 02/13/2012 06:19 PM, Mark Wiebe wrote: It might be nice to turn the matrix class into a short class hierarchy, something like this: class MatrixBase class DenseMatrix(MatrixBase) class TriangularMatrix(MatrixBase) # Maybe a few variations of upper/lower triangular and whether the diagonal is stored class SymmetricMatrix(MatrixBase) These other matrix classes could use packed storage, and could call the specific optimized BLAS/LAPACK functions to get higher performance when it is known the matrix is triangular or symmetric. I'm not sure whether this affects the discussion of the matrix * and \ operators, but it's a possibility to consider. I've been working on exactly this (+ some more) in January, and will be continuing to in the months to come. (Can write more tomorrow if anybody's interested -- or email me directly as I don't have a 0.1 release to show yet -- got to go now) Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] [SciPy-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3
On Monday, February 13, 2012, Aaron Meurer asmeu...@gmail.com wrote: I'd like the ability to make in (i.e., __contains__) return something other than a bool. Also, the ability to make the x y z syntax would be useful. It's been suggested that the ability to override the boolean operators (and, or, not) would be the way to do this (pep 335), though I'm not 100% convinced that's the way to go. Aaron Meurer +1 on these syntax ideas, however I do agree that it might be a bit problematic. Also, I remember once talking about labeled arrays and discussing ways to index them and ways to indicate which axis the indexing was for. That might require some sort of syntax changes. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant tra...@continuum.iowrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. Chuck On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. Thanks, -Mark On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant tra...@continuum.iowrote: The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought that int16 op scalar float
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
It hasn't changed: since float is of a fundamentally different kind of data, it's expected to upcast the result. However, if I may add a personal comment on numpy's casting rules: until now, I've found them confusing and somewhat inconsistent. Some of the inconsistencies I've found were bugs, while others were unintuitive behavior (or, you may say, me not having the correct intuition ;) In particular the rule about mixed scalar / array operations is currently only described in the doc by a rather vague sentence. Also, the fact that the result's dtype can depend on the actual numerical values can be confusing when you work with variable whose values can span a wide range. So I think if you could come up with a table that says an operation involving two arrays of dtype1 dtype2 always returns an output of dtype3, and a similar table for mixed scalar / array operations, that would be great! My 2 cents, -=- Olivier Le 13 février 2012 23:08, Travis Oliphant tra...@continuum.io a écrit : I can also confirm that at least on NumPy 1.5.1: integer array * (literal Python float scalar) --- creates a double result. So, my memory was incorrect on that (unless it changed at an earlier release, but I don't think so). -Travis On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. Thanks, -Mark On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant tra...@continuum.iowrote: The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought that int16 op scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. We will need to look in detail at what has changed. I will write a test to do that. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 7:58 PM, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant tra...@continuum.iowrote: Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example a+b could result in a different type than b+a. I recall there being some bugs in the tracker related to this as well, but I don't remember those details. This change felt
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Monday, February 13, 2012, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant tra...@continuum.io wrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. Chuck On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. Thanks, -Mark On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant tra...@continuum.io wrote: The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant tra...@continuum.iowrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :) I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the can cast safely table here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here: https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Well, everything I did for 1.6 that we're discussing here was volunteer work too. :) You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff! Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users. -Mark -Travis On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal,
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
These are great suggestions. I am happy to start digging into the code. I'm also happy to re-visit any and all design decisions for NumPy 2.0 (with a strong-eye towards helping people migrate and documenting the results). Mark, I think you have done an excellent job of working with a stodgy group and pushing things forward. That is a rare talent, and the world is a better place because you jumped in. There is a lot of cruft all over the place, I know. I also know a lot more now than I did 6 years ago about software design :-)I'm very excited about what we are going to be able to do with NumPy together --- and with the others in the community. But, I am also aware of *a lot* of users who never voice their opinion on this list, and a lot of features that they want and need and are currently working around the limitations of NumPy to get.These are going to be my primary focus for the rest of the 1.X series. I see at least a NumPy 1.8 at this point with maybe even a NumPy 1.9. At the same time, I am looking forward to working with you and others in the community as you lead the push toward NumPy 2.0 (which I hope is not delayed too long with all the possible discussions that can take place :-) ) Best regards, -Travis On Feb 13, 2012, at 10:31 PM, Mark Wiebe wrote: On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant tra...@continuum.io wrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :) I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the can cast safely table here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here: https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Well, everything I did for 1.6 that we're discussing here was volunteer work too. :) You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff! Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users. -Mark -Travis On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant tra...@continuum.io wrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. New developers are awesome, and the life-blood of a project. But, you have to respect the history of a code-base and if you are re-factoring code that might create a change in corner-cases, then you are absolutely responsible for writing the tests if they aren't there already.That is a pretty simple rule. If you are changing semantics and are not doing a new major version number that you can document the changes in, then any re-factor needs to have tests written *before* the re-factor to ensure behavior does not change. That might be annoying, for sure, and might make you curse the original author for not writing the tests you wish were already written --- but it doesn't change the fact that a released code has many, many tests already written for it in the way of applications and users. All of these are outside of the actual code-base, and may rely on behavior that you can't just change even if you think it needs to change. Bug-fixes are different, of course, but it can sometimes be difficult to discern what is a bug and what is just behavior that seems inappropriate. Type-coercion, in particular, can be a difficult nut to crack because NumPy doesn't always control what happens and is trying to work-within Python's stunted type-system. I've often thought that it might be easier if NumPy were more tightly integrated into Python. For example, it would be great if NumPy's Int-scalar was the same thing as Python's int. Same for float and complex.It would also be nice if you could specify scalar literals with different precisions in Python directly.I've often wished that NumPy developers had more access to all the great language people who have spent their time on IronPython, Jython, and PyPy instead. -Travis Chuck On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: I believe the main lessons to draw from this are just how incredibly important
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Monday, February 13, 2012, Travis Oliphant tra...@continuum.io wrote: On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant tra...@continuum.io wrote: I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system.If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced.NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests.It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing.Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... First of all, I don't recall the broken commutibility issue --- nor how long it had actually been in the code-base. So, I'm not sure how much weight to give that problem The problem I see with the weighting of these issues that is being implied is that 1) Requiring a re-compile is getting easier and easier as more and more people get their NumPy from distributions and not from downloads of NumPy itself. They just wait until the distribution upgrades and everything is re-compiled. 2) That same trend means that changes to run-time code (like those that can occur when type-coercion is changed) is likely to affect people much later after the discussions have taken place on the list and everyone who was involved in the discussion assumes all is fine. This sort of change should be signaled by a version change.I would like to understand what the bugginess was and where it was better because I think we are painting a wide-brush. Some-things I will probably agree with you were buggy, but others are likely just different preferences. I have a script that documents the old-behavior. I will compare it to the new behavior and we can go from there.Certainly, there is precedent for using something like a __future__ statement to move forward which your boolean switch implies. -Travis Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a see this. Spending time reverting or whatever would be a waste of resources, IMHO. Chuck You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of minor casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to revert if at all possible.There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. Best regards, -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 11:00 PM, Travis Oliphant tra...@continuum.iowrote: No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... First of all, I don't recall the broken commutibility issue --- nor how long it had actually been in the code-base. So, I'm not sure how much weight to give that problem The problem I see with the weighting of these issues that is being implied is that 1) Requiring a re-compile is getting easier and easier as more and more people get their NumPy from distributions and not from downloads of NumPy itself. They just wait until the distribution upgrades and everything is re-compiled. 2) That same trend means that changes to run-time code (like those that can occur when type-coercion is changed) is likely to affect people much later after the discussions have taken place on the list and everyone who was involved in the discussion assumes all is fine. This sort of change should be signaled by a version change.I would like to understand what the bugginess was and where it was better because I think we are painting a wide-brush. Some-things I will probably agree with you were buggy, but others are likely just different preferences. I have a script that documents the old-behavior. I will compare it to the new behavior and we can go from there.Certainly, there is precedent for using something like a __future__ statement to move forward which your boolean switch implies. Let it go, Travis. It's a waste of time. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant tra...@continuum.iowrote: No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a see this. Spending time reverting or whatever would be a waste of resources, IMHO. Chuck You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of minor casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to revert if at all possible.There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be confusing. So there are still plenty of issues to deal with. I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 10:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant tra...@continuum.iowrote: No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a see this. Spending time reverting or whatever would be a waste of resources, IMHO. Chuck You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of minor casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to revert if at all possible.There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be confusing. So there are still plenty of issues to deal with. This reminds me of something that it would be really nice for the bug tracker to have - user votes. This might be a particularly good way to draw in some more of the users who don't want to stick their neck out with emails and comments, put are comfortable adding a vote to a bug. Something like this: http://code.google.com/p/googleappengine/issues/detail?id=190 where it says that 566 people have starred the issue. -Mark I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy.arange() error?
I think the problem is quite easy to solve, without changing the documentation behaviour. The doc says: Help on built-in function arange in module numpy.core.multiarray: / arange(...) arange([start,] stop[, step,], dtype=None) Return evenly spaced values within a given interval. Values are generated within the half-open interval ``[start, stop)`` (in other words, the interval including `start` but excluding `stop`). For integer arguments the function is equivalent to the Python built-in `range http://docs.python.org/lib/built-in-funcs.html`_ function, but returns a ndarray rather than a list. / stop is exclusive by definition. So substracting a very small value to stop when processing stop I think is the best way. Matteo Il 10/02/2012 02:22, Drew Frank ha scritto: On Thu, Feb 9, 2012 at 3:40 PM, Benjamin Root ben.r...@ou.edu mailto:ben.r...@ou.edu wrote: On Thursday, February 9, 2012, Sturla Molden stu...@molden.no mailto:stu...@molden.no wrote: Den 9. feb. 2012 kl. 22:44 skrev eat e.antero.ta...@gmail.com mailto:e.antero.ta...@gmail.com: Maybe this issue is raised also earlier, but wouldn't it be more consistent to let arange operate only with integers (like Python's range) and let linspace handle the floats as well? Perhaps. Another possibility would be to let arange take decimal arguments, possibly entered as text strings. Sturla Personally, I treat arange() to mean, give me a sequence of values from x to y, exclusive, with a specific step size. Nowhere in that statement does it guarantee a particular number of elements. Whereas linspace() means, give me a sequence of evenly spaced numbers from x to y, optionally inclusive, such that there are exactly N elements. They complement each other well. I agree -- both functions are useful and I think about them the same way. The unfortunate part is that tiny precision errors in y can make arange appear to be sometimes-exclusive rather than always exclusive. I've always imagined there to be a sort of duality between the two functions, where arange(low, high, step) == linspace(low, high-step, round((high-low)/step)) in cases where (high - low)/step is integral, but it turns out this is not the case. There are times when I intentionally will specify a range where the step size will not nicely fit. i.e.- np.arange(1, 7, 3.5). I wouldn't want this to change. Nor would I. What I meant to express earlier is that I like how Matlab addresses this particular class of floating point precision errors, not that I think arange output should somehow include both endpoints. My vote is that if users want matlab-colon-like behavior, we could make a new function - maybe erange() for exact range? Ben Root That could work; it would completely replace arange for me in every circumstance I can think of, but I understand we can't just go changing the behavior of core functions. Drew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- --- Matteo Malosio, Eng. Researcher ITIA-CNR (www.itia.cnr.it) Institute of Industrial Technologies and Automation National Research Council via Bassini 15, 20133 MILANO, ITALY Ph: +39 0223699625 Fax: +39 0223699925 e-mail:matteo.malo...@itia.cnr.it --- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
On Mon, Feb 13, 2012 at 10:48 PM, Mark Wiebe mwwi...@gmail.com wrote: On Mon, Feb 13, 2012 at 10:38 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant tra...@continuum.iowrote: No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your cavalier charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a see this. Spending time reverting or whatever would be a waste of resources, IMHO. Chuck You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of minor casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to revert if at all possible.There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be confusing. So there are still plenty of issues to deal with. This reminds me of something that it would be really nice for the bug tracker to have - user votes. This might be a particularly good way to draw in some more of the users who don't want to stick their neck out with emails and comments, put are comfortable adding a vote to a bug. Something like this: http://code.google.com/p/googleappengine/issues/detail?id=190 where it says that 566 people have starred the issue. Here's how this feature looks in YouTrack: http://youtrack.jetbrains.net/issues?q=sort+by%3Avotes -Mark -Mark I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of minor casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to revert if at all possible.There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. First of all, I would hardly call it one of my babies. I care far more for my actual babies than for this.It was certainly one of my headaches that I had to deal with and write code for (and take into account previous behavior with). I certainly spent a lot of time wrestling with type-coercion and integrating numerous opinions as quickly as I could with it --- even in Numeric with the funny down_casting arrays. At best the resulting system was a compromise (with an implementation that you could reason about with the right perspective despite claims to the contrary). This discussion is not about me being sensitive because I wrote some code or had a hand in a design that needed changing. I hope we replace all the code I've written with something better. I expect that eventually. This just has to be done in an appropriate way. I'm sensitive because I understand where the previous code came from and *why it was written* and am concerned about changing things out from under users in ways that are subtle. I continue to affirm that breaking ABI compatibility is much preferable to changing type-casting behavior. I know people disagree with me. But, distributions help solve the ABI compatibility problem, but nothing solves required code changes due to subtle type-casting issues. I would just expect this sort of change at NumPy 2.0. We could have waited for half-float until then. I will send the result of my analysis shortly on what changed between 1.5.1 and 1.6.1 -Travis ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Fwd: Re: Creating parallel curves
2012/2/13 Andrea Gavana andrea.gav...@gmail.com -- Forwarded message -- From: Andrea Gavana andrea.gav...@gmail.com Date: Feb 13, 2012 11:31 PM Subject: Re: [Numpy-discussion] Creating parallel curves To: Jonathan Hilmer jkhil...@gmail.com Thank you Jonathan for this, it's exactly what I was looking for. I' ll try it tomorrow on the 768 well trajectories I have and I'll let you know if I stumble upon any issue. If someone could shed some light on my problem number 2 (how to adjust the scaling/distance) so that the curves look parallel on a matplotlib graph even though the axes scales are different, I'd be more than grateful. Thank you in advance. Hi. Maybe this could help you as a starting point. *from Shapely.geometry import LineString from matplotlib import pyplot myline = LineString(...) x, y = myline.xy xx, yy = myline.buffer(distancefrommyline).exterior.xy # coordinates around myline pyplot.plot(x, y) pyplot.plot(xx,yy) pyplot.show()* Best. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x?
The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be confusing. So there are still plenty of issues to deal with I didn't think it was in the precision. I knew what you meant. However, I'm still hoping for an example of what you mean by lack of commutativity in the typecodes. The confusion of long and longlong varying from platform to platform comes from C. The whole point of having long and longlong is to ensure that you can specify the same types in Python that you would in C. They should not be used if you don't care about that. Deriving from Python types for some array-scalars is an issue. I don't like that either. However, Python itself special-cases it's scalars in ways that necessitated it to have some use-cases not fall-over.This shows a limitation of Python. I would prefer that all array-scalars were recognized appropriately by the Python type system. Most of the concerns that you mention here are mis-understandings. Maybe there are solutions that fix the problem without just educating people. I am open to them. I do think that it was a mistake to have the intp and uintp dtypes as *separate* dtypes. They should have just mapped to the right one. I think it was also a mistake to have dtypes for all the C-spellings instead of just a dtype for each different bit-length with an alias for the C-spellings. We should change that in NumPy 2.0. -Travis I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion