On Tuesday, 16 August 2011 15:27:30 Armin Rigo wrote: > Hi David, > > On Mon, Aug 15, 2011 at 6:20 PM, David Naylor <[email protected]> wrote: > > For me the performance of datetime object's hashing is sufficient but I > > think the python code could use some performance improvements. Is my > > approach using a direct computation to type long acceptable (in > > principle). If so I can refine it and submit a patch. > > Yes, replacing the hash with a faster-to-compute one is fine. It's > best performance-wise if you can avoid using Python longs. As far as > I know it just needs some random-looking xor-ing and shifting of the > fields. Note, of course, that you must carefully satisfy the property > that for any objects x and y, if "x == y" then "hash(x) == hash(y)".
Below is the patch, and results, for my proposed hash methods for datetime.datetime (and easily adaptable to include tzinfo and the other datetime objects). I tried to make the hash safe for both 32bit and 64bit systems, and beyond. The results are: # python datetest.py (datetime.py) hash_unity: 35.83 seconds hash_unity: 44.60 seconds hash_datetime: 65.58 seconds hash_datetime: 53.95 seconds # python datetest.py hash_unity: 5.70 seconds hash_unity: 5.69 seconds hash_datetime: 4.88 seconds hash_datetime: 4.90 seconds # pypy datetest.py hash_unity: 0.74 seconds hash_unity: 0.63 seconds hash_datetime: 11.74 seconds hash_datetime: 11.47 seconds # pypy datetest.py (patched datetime.py) hash_unity: 0.73 seconds hash_unity: 0.62 seconds hash_datetime: 0.76 seconds hash_datetime: 0.64 seconds So, based on my patch there is a 7.7x improvement over python and a 17.9x improvement over the previous pypy implementation. If the above approach is acceptable I will complete the patch? Regards
--- /usr/local/pypy-1.6/lib_pypy/datetime.py 2011-08-11 20:39:53.000000000 +0200
+++ datetime.py 2011-08-25 21:31:56.000000000 +0200
@@ -17,11 +17,14 @@
"""
import time as _time
+import sys as _sys
import math as _math
MINYEAR = 1
MAXYEAR = 9999
+_BITS = int(_math.log(_sys.maxint, 2) + 1)
+
# Utility functions, adapted from Python's Demo/classes/Dates.py, which
# also assumes the current Gregorian calendar indefinitely extended in
# both directions. Difference: Dates.py calls January 1 of year 0 day
@@ -1796,12 +1799,14 @@
return base + timedelta(minutes = otoff-myoff)
def __hash__(self):
- tzoff = self._utcoffset()
- if tzoff is None:
- return hash(self.__getstate()[0])
- days = _ymd2ord(self.year, self.month, self.day)
- seconds = self.hour * 3600 + (self.minute - tzoff) * 60 + self.second
- return hash(timedelta(days, seconds, self.microsecond))
+ if _BITS >= 64:
+ return (self.microsecond ^ (self.second << 20) ^
+ (self.minute << 26) ^ (self.hour << 32) ^ (self.day << 37) ^
+ (self.month << 42) ^ (self.year << 46))
+ else:
+ return ((self.microsecond ^ (self.hour << 20) ^ (self.day << 25)) ^
+ (self.second ^ (self.minute << 6) ^ (self.month << 12) ^
+ (self.year << 19)))
# Pickle support.
import datetime
import random
import time
dates = [
(random.randint(datetime.MINYEAR, datetime.MAXYEAR),
random.randint(1, 12),
random.randint(1, 28),
random.randint(1, 23),
random.randint(0, 59),
random.randint(0, 59),
random.randint(0, 999999)) for i in range(10000000)]
def bench(fn):
def inner(*args, **kwds):
start = time.clock()
res = fn(*args, **kwds)
end = time.clock()
try:
name = fn.__name__
except AttributeError:
name = repr(fn)
#
print '%s: %.2f seconds' % (name, end-start)
return res
return inner
class unity(datetime.datetime):
def __hash__(self):
return 0
@bench
def hash_unity():
result = 0
for date in dates:
result ^= hash(unity(*date))
return result
@bench
def hash_datetime():
result = 0
for date in dates:
result ^= hash(datetime.datetime(*date))
return result
hash_unity()
hash_unity()
hash_datetime()
hash_datetime()
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ pypy-dev mailing list [email protected] http://mail.python.org/mailman/listinfo/pypy-dev
