On 3 June 2011 08:07, Keir Rice <keirr...@gmail.com> wrote: > Hi All, > > The following function was showing up in my profiles as a large bottle > neck: > > # Slow version > def RMSBand(self, histogram): > """Calculates the root-mean-squared value for the given colour > stream histogram.""" > intermediateResult = map(lambda (i, h): h*(i**2), zip(histogram, > range(255))) > totalSum = sum(intermediateResult) > > # calculate rms > return math.sqrt(totalSum / self.Area()) > > So after a bit of trial and error I came up with the following function > which is a lot faster: > > # Fast version > def RMSBand(self, histogram): > """Calculates the root-mean-squared value for the given colour > stream histogram.""" > totalSum = 0 > for i, h in enumerate(histogram): > totalSum += h*(i**2) > > # calculate rms > return math.sqrt(totalSum / self.Area()) > > My question is why is the second method so much faster? > Is it the removal of the extra function calls? > Is it skipping the creation of a list? > Do the basic arithmetic operators get pushed up into C code? > > Any insights into the whats really happening behind the scenes would be > appreciated. >
First of all, do you have the parameters to the lambda the wrong way around in the map() version? zip(histogram, range(255)) will return (histogram value, index), but enumerate(histogram) will return (index, histogram value). But the parameters haven't been swapped, resulting in the histogram value being squared in the map version, but the index being squared in the manual summing version. Depending on the values, this could result in a large performance increase in the second version (if the total value exceeds the maximum size of your platform's "long" type). Ignoring that (though I've set the parameters below to how I think they *should* be) ... Probably the biggest savings are list creating and jumping between C- and Python-functions during the map call. The lambda is a Python function, which are notoriously slow to use from within map() in comparison to keeping it all as C-level. Creating intermediate lists is totally unnecessary and obviously a waste of time. You're actually creating 3 intermediate lists - one with map(), one with zip() and one with range() (the third only if this is Python 2.x code, not 3.x). The best way to find out exactly where the savings are would be to eliminate one piece of it at a time and see where the savings are (using the timeit module). For example, you can use a list comprehension to get the same effect as map() without jumping in and out of python code: # Use list comprehension instead of map() def RMSBand(self, histogram): """Calculates the root-mean-squared value for the given colour stream histogram.""" intermediateResult = [h*(i**2) for (h, i) in zip(histogram, range(255))] totalSum = sum(intermediateResult) # calculate rms return math.sqrt(totalSum / self.Area()) # Use itertools.izip() instead of zip() def RMSBand(self, histogram): """Calculates the root-mean-squared value for the given colour stream histogram.""" intermediateResult = map(lambda (h, i): h*(i**2), itertools.izip(histogram, range(255))) totalSum = sum(intermediateResult) # calculate rms return math.sqrt(totalSum / self.Area()) # Use xrange() instead of range() def RMSBand(self, histogram): """Calculates the root-mean-squared value for the given colour stream histogram.""" intermediateResult = map(lambda (h, i): h*(i**2), zip(histogram, xrange(255))) totalSum = sum(intermediateResult) # calculate rms return math.sqrt(totalSum / self.Area()) # Use enumerate() instead of zip(range()) def RMSBand(self, histogram): """Calculates the root-mean-squared value for the given colour stream histogram.""" intermediateResult = map(lambda (h, i): h*(i**2), enumerate(histogram)) totalSum = sum(intermediateResult) # calculate rms return math.sqrt(totalSum / self.Area()) BTW, the following might (or might not) be faster again: # Pass a generator expression to sum() instead of manually summing def RMSBand(self, histogram): """Calculates the root-mean-squared value for the given colour stream histogram.""" totalSum = sum(h*(i**2) for (i, h) in enumerate(histogram)) # calculate rms return math.sqrt(totalSum / self.Area()) Tim Delaney
-- http://mail.python.org/mailman/listinfo/python-list