On Tuesday, 16 August 2011 15:27:30 Armin Rigo wrote: > Hi David, > > On Mon, Aug 15, 2011 at 6:20 PM, David Naylor <naylor.b.da...@gmail.com> wrote: > > For me the performance of datetime object's hashing is sufficient but I > > think the python code could use some performance improvements. Is my > > approach using a direct computation to type long acceptable (in > > principle). If so I can refine it and submit a patch. > > Yes, replacing the hash with a faster-to-compute one is fine. It's > best performance-wise if you can avoid using Python longs. As far as > I know it just needs some random-looking xor-ing and shifting of the > fields. Note, of course, that you must carefully satisfy the property > that for any objects x and y, if "x == y" then "hash(x) == hash(y)".
Below is the patch, and results, for my proposed hash methods for datetime.datetime (and easily adaptable to include tzinfo and the other datetime objects). I tried to make the hash safe for both 32bit and 64bit systems, and beyond. The results are: # python datetest.py (datetime.py) hash_unity: 35.83 seconds hash_unity: 44.60 seconds hash_datetime: 65.58 seconds hash_datetime: 53.95 seconds # python datetest.py hash_unity: 5.70 seconds hash_unity: 5.69 seconds hash_datetime: 4.88 seconds hash_datetime: 4.90 seconds # pypy datetest.py hash_unity: 0.74 seconds hash_unity: 0.63 seconds hash_datetime: 11.74 seconds hash_datetime: 11.47 seconds # pypy datetest.py (patched datetime.py) hash_unity: 0.73 seconds hash_unity: 0.62 seconds hash_datetime: 0.76 seconds hash_datetime: 0.64 seconds So, based on my patch there is a 7.7x improvement over python and a 17.9x improvement over the previous pypy implementation. If the above approach is acceptable I will complete the patch? Regards
--- /usr/local/pypy-1.6/lib_pypy/datetime.py 2011-08-11 20:39:53.000000000 +0200 +++ datetime.py 2011-08-25 21:31:56.000000000 +0200 @@ -17,11 +17,14 @@ """ import time as _time +import sys as _sys import math as _math MINYEAR = 1 MAXYEAR = 9999 +_BITS = int(_math.log(_sys.maxint, 2) + 1) + # Utility functions, adapted from Python's Demo/classes/Dates.py, which # also assumes the current Gregorian calendar indefinitely extended in # both directions. Difference: Dates.py calls January 1 of year 0 day @@ -1796,12 +1799,14 @@ return base + timedelta(minutes = otoff-myoff) def __hash__(self): - tzoff = self._utcoffset() - if tzoff is None: - return hash(self.__getstate()[0]) - days = _ymd2ord(self.year, self.month, self.day) - seconds = self.hour * 3600 + (self.minute - tzoff) * 60 + self.second - return hash(timedelta(days, seconds, self.microsecond)) + if _BITS >= 64: + return (self.microsecond ^ (self.second << 20) ^ + (self.minute << 26) ^ (self.hour << 32) ^ (self.day << 37) ^ + (self.month << 42) ^ (self.year << 46)) + else: + return ((self.microsecond ^ (self.hour << 20) ^ (self.day << 25)) ^ + (self.second ^ (self.minute << 6) ^ (self.month << 12) ^ + (self.year << 19))) # Pickle support.
import datetime import random import time dates = [ (random.randint(datetime.MINYEAR, datetime.MAXYEAR), random.randint(1, 12), random.randint(1, 28), random.randint(1, 23), random.randint(0, 59), random.randint(0, 59), random.randint(0, 999999)) for i in range(10000000)] def bench(fn): def inner(*args, **kwds): start = time.clock() res = fn(*args, **kwds) end = time.clock() try: name = fn.__name__ except AttributeError: name = repr(fn) # print '%s: %.2f seconds' % (name, end-start) return res return inner class unity(datetime.datetime): def __hash__(self): return 0 @bench def hash_unity(): result = 0 for date in dates: result ^= hash(unity(*date)) return result @bench def hash_datetime(): result = 0 for date in dates: result ^= hash(datetime.datetime(*date)) return result hash_unity() hash_unity() hash_datetime() hash_datetime()
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev