On Sun, 12 Aug 2007 02:26:59 -0700, Erik Max Francis wrote: > For a file hashing system (finding similar files, rather than identical > ones), I need to be able to efficiently and quickly sum the ordinals of > the bytes of a file and their squares. Because of the nature of the > application, it's a requirement that I do it in Python, or only with > standard library modules (if such facilities exist) that might assist. > > So far the fastest way I've found is using the `sum` builtin and > generators:: > > ordinalSum = sum(ord(x) for x in data) > ordinalSumSquared = sum(ord(x)**2 for x in data) > > This is about twice as fast as an explicit loop, but since it's going to > be processing massive amounts of data, the faster the better. Are there > any tricks I'm not thinking of, or perhaps helper functions in other > modules that I'm not thinking of? >
I see a lot of messages attacking the CPU optimization, but what about the I/O optimization - which admittedly, the question seems to sidestep. You might experiment with using mmap() instead of read()... If it helps, it may help big, because the I/O time is likely to dominate the CPU time. -- http://mail.python.org/mailman/listinfo/python-list