On Mon, Feb 13, 2012 at 23:23, Marcel Oliver <m.oli...@jacobs-university.de> wrote: > Hi, > > I have a short piece of code where the use of an index array "feels > right", but incurs a severe performance penalty: It's about an order > of magnitude slower than all other operations with arrays of that > size. > > It comes up in a piece of code which is doing a large number of "on > the fly" histograms via > > hist[i,j] += 1 > > where i is an array with the bin index to be incremented and j is > simply enumerating the histograms. I attach a full short sample code > below which shows how it's being used in context, and corresponding > timeit output from the critical code section.
Other people have explained that yes, applying index arrays is slow. I would just like to add the tangential point that this code does not behave the way that you think it does. You cannot make histograms like this. The statement "hist[i,j] += 1" gets broken down into three separate statements by the Python compiler: tmp = hist.__getitem__((i,j)) tmp = tmp.__iadd__(1) hist.__setitem__((i,j), tmp) Note that tmp is a new array with copies of the data in hist at the (i,j) locations, possibly multiple copies if the i index has repetitions. Each one of these copies gets incremented by 1, then the __setitem__() will apply each of those in turn to the appropriate cell in hist, each one simply overwriting the previous one. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion