On Mon, Feb 13, 2012 at 7:48 PM, Wes McKinney <wesmck...@gmail.com> wrote: > On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney <wesmck...@gmail.com> wrote: >> On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> How would you fix it? I shouldn't speculate without profiling, but I'll be >>> naughty. Presumably the problem is that python turns that into something >>> like >>> >>> hist[i,j] = hist[i,j] + 1 >>> >>> which means there's no way for numpy to avoid creating a temporary array. So >>> maybe this could be fixed by adding a fused __inplace_add__ protocol to the >>> language (and similarly for all the other inplace operators), but that seems >>> really unlikely. Fundamentally this is just the sort of optimization >>> opportunity you miss when you don't have a compiler with a global view; >>> Fortran or c++ expression templates will win every time. Maybe pypy will fix >>> it someday. >>> >>> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? >>> >>> - N >> >> Nope, don't buy it: >> >> In [33]: timeit arr.__iadd__(1) >> 1000 loops, best of 3: 1.13 ms per loop >> >> In [37]: timeit arr[:] += 1 >> 1000 loops, best of 3: 1.13 ms per loop >> >> - Wes > > Actually, apologies, I'm being silly (had too much coffee or > something). Python may be doing something nefarious with the hist[i,j] > += 1. So both a get, add, then set, which is probably the problem. > >>> On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmck...@gmail.com> wrote: >>>> >>>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >>>> <m.oli...@jacobs-university.de> wrote: >>>> > Hi, >>>> > >>>> > I have a short piece of code where the use of an index array "feels >>>> > right", but incurs a severe performance penalty: It's about an order >>>> > of magnitude slower than all other operations with arrays of that >>>> > size. >>>> > >>>> > It comes up in a piece of code which is doing a large number of "on >>>> > the fly" histograms via >>>> > >>>> > hist[i,j] += 1 >>>> > >>>> > where i is an array with the bin index to be incremented and j is >>>> > simply enumerating the histograms. I attach a full short sample code >>>> > below which shows how it's being used in context, and corresponding >>>> > timeit output from the critical code section. >>>> > >>>> > Questions: >>>> > >>>> > - Is this a matter of principle, or due to an inefficient >>>> > implementation? >>>> > - Is there an equivalent way of doing it which is fast? >>>> > >>>> > Regards, >>>> > Marcel >>>> > >>>> > ================================================================= >>>> > >>>> > #! /usr/bin/env python >>>> > # Plot the bifurcation diagram of the logistic map >>>> > >>>> > from pylab import * >>>> > >>>> > Nx = 800 >>>> > Ny = 600 >>>> > I = 50000 >>>> > >>>> > rmin = 2.5 >>>> > rmax = 4.0 >>>> > ymin = 0.0 >>>> > ymax = 1.0 >>>> > >>>> > rr = linspace (rmin, rmax, Nx) >>>> > x = 0.5*ones(rr.shape) >>>> > hist = zeros((Ny+1,Nx), dtype=int) >>>> > j = arange(Nx) >>>> > >>>> > dy = ymax/Ny >>>> > >>>> > def f(x): >>>> > return rr*x*(1.0-x) >>>> > >>>> > for n in xrange(1000): >>>> > x = f(x) >>>> > >>>> > for n in xrange(I): >>>> > x = f(x) >>>> > i = array(x/dy, dtype=int) >>>> > hist[i,j] += 1 >>>> > >>>> > figure() >>>> > >>>> > imshow(hist, >>>> > cmap='binary', >>>> > origin='lower', >>>> > interpolation='nearest', >>>> > extent=(rmin,rmax,ymin,ymax), >>>> > norm=matplotlib.colors.LogNorm()) >>>> > >>>> > xlabel ('$r$') >>>> > ylabel ('$x$') >>>> > >>>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >>>> > >>>> > show() >>>> > >>>> > ==================================================================== >>>> > >>>> > In [4]: timeit y=f(x) >>>> > 10000 loops, best of 3: 19.4 us per loop >>>> > >>>> > In [5]: timeit i = array(x/dy, dtype=int) >>>> > 10000 loops, best of 3: 22 us per loop >>>> > >>>> > In [6]: timeit img[i,j] += 1 >>>> > 10000 loops, best of 3: 119 us per loop >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion@scipy.org >>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> This suggests to me that fancy indexing could be quite a bit faster in >>>> this case: >>>> >>>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >>>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >>>> 10000 loops, best of 3: 20.6 us per loop >>>> >>>> I wrote a simple Cython method >>>> >>>> def fancy_inc(ndarray[int64_t, ndim=2] values, >>>> ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >>>> cdef: >>>> Py_ssize_t i, n = len(iarr) >>>> >>>> for i in range(n): >>>> values[iarr[i], jarr[i]] += inc >>>> >>>> that does even faster >>>> >>>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >>>> 100000 loops, best of 3: 4.85 us per loop >>>> >>>> About 10% faster if bounds checking and wraparound are disabled. >>>> >>>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >>>> >>>> - Wes >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion@scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>
But: In [40]: timeit hist[i, j] 10000 loops, best of 3: 32 us per loop So that's roughly 7-8x slower than a simple Cython method, so I sincerely hope it could be brought down to the sub 10 microsecond level with a little bit of work. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion