On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney <wesmck...@gmail.com> wrote: > On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <n...@pobox.com> wrote: >> How would you fix it? I shouldn't speculate without profiling, but I'll be >> naughty. Presumably the problem is that python turns that into something >> like >> >> hist[i,j] = hist[i,j] + 1 >> >> which means there's no way for numpy to avoid creating a temporary array. So >> maybe this could be fixed by adding a fused __inplace_add__ protocol to the >> language (and similarly for all the other inplace operators), but that seems >> really unlikely. Fundamentally this is just the sort of optimization >> opportunity you miss when you don't have a compiler with a global view; >> Fortran or c++ expression templates will win every time. Maybe pypy will fix >> it someday. >> >> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? >> >> - N > > Nope, don't buy it: > > In [33]: timeit arr.__iadd__(1) > 1000 loops, best of 3: 1.13 ms per loop > > In [37]: timeit arr[:] += 1 > 1000 loops, best of 3: 1.13 ms per loop > > - Wes
Actually, apologies, I'm being silly (had too much coffee or something). Python may be doing something nefarious with the hist[i,j] += 1. So both a get, add, then set, which is probably the problem. >> On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmck...@gmail.com> wrote: >>> >>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >>> <m.oli...@jacobs-university.de> wrote: >>> > Hi, >>> > >>> > I have a short piece of code where the use of an index array "feels >>> > right", but incurs a severe performance penalty: It's about an order >>> > of magnitude slower than all other operations with arrays of that >>> > size. >>> > >>> > It comes up in a piece of code which is doing a large number of "on >>> > the fly" histograms via >>> > >>> > hist[i,j] += 1 >>> > >>> > where i is an array with the bin index to be incremented and j is >>> > simply enumerating the histograms. I attach a full short sample code >>> > below which shows how it's being used in context, and corresponding >>> > timeit output from the critical code section. >>> > >>> > Questions: >>> > >>> > - Is this a matter of principle, or due to an inefficient >>> > implementation? >>> > - Is there an equivalent way of doing it which is fast? >>> > >>> > Regards, >>> > Marcel >>> > >>> > ================================================================= >>> > >>> > #! /usr/bin/env python >>> > # Plot the bifurcation diagram of the logistic map >>> > >>> > from pylab import * >>> > >>> > Nx = 800 >>> > Ny = 600 >>> > I = 50000 >>> > >>> > rmin = 2.5 >>> > rmax = 4.0 >>> > ymin = 0.0 >>> > ymax = 1.0 >>> > >>> > rr = linspace (rmin, rmax, Nx) >>> > x = 0.5*ones(rr.shape) >>> > hist = zeros((Ny+1,Nx), dtype=int) >>> > j = arange(Nx) >>> > >>> > dy = ymax/Ny >>> > >>> > def f(x): >>> > return rr*x*(1.0-x) >>> > >>> > for n in xrange(1000): >>> > x = f(x) >>> > >>> > for n in xrange(I): >>> > x = f(x) >>> > i = array(x/dy, dtype=int) >>> > hist[i,j] += 1 >>> > >>> > figure() >>> > >>> > imshow(hist, >>> > cmap='binary', >>> > origin='lower', >>> > interpolation='nearest', >>> > extent=(rmin,rmax,ymin,ymax), >>> > norm=matplotlib.colors.LogNorm()) >>> > >>> > xlabel ('$r$') >>> > ylabel ('$x$') >>> > >>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >>> > >>> > show() >>> > >>> > ==================================================================== >>> > >>> > In [4]: timeit y=f(x) >>> > 10000 loops, best of 3: 19.4 us per loop >>> > >>> > In [5]: timeit i = array(x/dy, dtype=int) >>> > 10000 loops, best of 3: 22 us per loop >>> > >>> > In [6]: timeit img[i,j] += 1 >>> > 10000 loops, best of 3: 119 us per loop >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion@scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> This suggests to me that fancy indexing could be quite a bit faster in >>> this case: >>> >>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >>> 10000 loops, best of 3: 20.6 us per loop >>> >>> I wrote a simple Cython method >>> >>> def fancy_inc(ndarray[int64_t, ndim=2] values, >>> ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >>> cdef: >>> Py_ssize_t i, n = len(iarr) >>> >>> for i in range(n): >>> values[iarr[i], jarr[i]] += inc >>> >>> that does even faster >>> >>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >>> 100000 loops, best of 3: 4.85 us per loop >>> >>> About 10% faster if bounds checking and wraparound are disabled. >>> >>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >>> >>> - Wes >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion