On Tuesday, 16 August 2011 15:27:30 Armin Rigo wrote:
> Hi David,
> 
> On Mon, Aug 15, 2011 at 6:20 PM, David Naylor <naylor.b.da...@gmail.com> 
wrote:
> > For me the performance of datetime object's hashing is sufficient but I
> > think the python code could use some performance improvements.  Is my
> > approach using a direct computation to type long acceptable (in
> > principle).  If so I can refine it and submit a patch.
> 
> Yes, replacing the hash with a faster-to-compute one is fine.  It's
> best performance-wise if you can avoid using Python longs.  As far as
> I know it just needs some random-looking xor-ing and shifting of the
> fields.  Note, of course, that you must carefully satisfy the property
> that for any objects x and y, if "x == y" then "hash(x) == hash(y)".

Below is the patch, and results, for my proposed hash methods for 
datetime.datetime (and easily adaptable to include tzinfo and the other 
datetime objects).  I tried to make the hash safe for both 32bit and 64bit 
systems, and beyond.  

The results are:
# python datetest.py (datetime.py)
hash_unity: 35.83 seconds
hash_unity: 44.60 seconds
hash_datetime: 65.58 seconds
hash_datetime: 53.95 seconds

# python datetest.py
hash_unity: 5.70 seconds
hash_unity: 5.69 seconds
hash_datetime: 4.88 seconds
hash_datetime: 4.90 seconds

# pypy datetest.py
hash_unity: 0.74 seconds
hash_unity: 0.63 seconds
hash_datetime: 11.74 seconds
hash_datetime: 11.47 seconds

# pypy datetest.py (patched datetime.py)
hash_unity: 0.73 seconds
hash_unity: 0.62 seconds
hash_datetime: 0.76 seconds
hash_datetime: 0.64 seconds

So, based on my patch there is a 7.7x improvement over python and a 17.9x 
improvement over the previous pypy implementation.  

If the above approach is acceptable I will complete the patch?  

Regards
--- /usr/local/pypy-1.6/lib_pypy/datetime.py	2011-08-11 20:39:53.000000000 +0200
+++ datetime.py	2011-08-25 21:31:56.000000000 +0200
@@ -17,11 +17,14 @@
 """
 
 import time as _time
+import sys as _sys
 import math as _math
 
 MINYEAR = 1
 MAXYEAR = 9999
 
+_BITS = int(_math.log(_sys.maxint, 2) + 1)
+
 # Utility functions, adapted from Python's Demo/classes/Dates.py, which
 # also assumes the current Gregorian calendar indefinitely extended in
 # both directions.  Difference:  Dates.py calls January 1 of year 0 day
@@ -1796,12 +1799,14 @@
         return base + timedelta(minutes = otoff-myoff)
 
     def __hash__(self):
-        tzoff = self._utcoffset()
-        if tzoff is None:
-            return hash(self.__getstate()[0])
-        days = _ymd2ord(self.year, self.month, self.day)
-        seconds = self.hour * 3600 + (self.minute - tzoff) * 60 + self.second
-        return hash(timedelta(days, seconds, self.microsecond))
+        if _BITS >= 64:
+            return (self.microsecond ^ (self.second << 20) ^
+                    (self.minute << 26) ^ (self.hour << 32) ^ (self.day << 37) ^
+                    (self.month << 42) ^ (self.year << 46))
+        else:
+            return ((self.microsecond ^ (self.hour << 20) ^ (self.day << 25)) ^
+                    (self.second ^ (self.minute << 6) ^ (self.month << 12) ^
+                     (self.year << 19)))
 
     # Pickle support.
 
import datetime
import random
import time

dates = [
    (random.randint(datetime.MINYEAR, datetime.MAXYEAR),
     random.randint(1, 12),
     random.randint(1, 28),
     random.randint(1, 23),
     random.randint(0, 59),
     random.randint(0, 59),
     random.randint(0, 999999)) for i in range(10000000)]

def bench(fn):
    def inner(*args, **kwds):
        start = time.clock()
        res = fn(*args, **kwds)
        end = time.clock()
        try:
            name = fn.__name__
        except AttributeError:
            name = repr(fn)
        #
        print '%s: %.2f seconds' % (name, end-start)
        return res
    return inner

class unity(datetime.datetime):
    def __hash__(self):
        return 0

@bench
def hash_unity():
    result = 0
    for date in dates:
        result ^= hash(unity(*date))
    return result

@bench
def hash_datetime():
    result = 0
    for date in dates:
        result ^= hash(datetime.datetime(*date))
    return result

hash_unity()
hash_unity()
hash_datetime()
hash_datetime()

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to