[issue1772851] Decimal and long hash, compatibly and efficiently

Mark Dickinson Fri, 14 Sep 2007 21:06:48 -0700


                                        Mark Dickinson
                                 added the comment:


Here's a revised patch, that only touches longobject.c and does the minimum 
necessary to alter long.__hash__ from being *almost* completely predictable to 
being completely predictable.  To summarize the effects of this patch:

 - the current long.__hash__ is predictable except for a very small number of 
(unpredictable) inputs;  this patch makes it predictable for all inputs, 
thereby 
making it possible to define an efficient Decimal.__hash__.  To be precise, the 
patched long.__hash__ has the property that it's periodic of period ULONG_MAX 
in 
the ranges [1, infinity) and (-infinity, -1].

 - the value of hash(n) is unchanged for any n that's small enough to be 
representable as an int, and also unchanged for the vast majority of long 
integers n of reasonable size.  (For integers up to 450 digits long, the new 
hash 
value differs from the old for around 1 in every 2500 integers on a 32-bit 
system;  on a 64-bit system the figure is 1 in every 10000 billion.)

 - On my test system (Dual Xeon 2.8GHz/SuSE Linux 10.2/gcc 4.1), using
timeit.timeit('hash(n)') there's no measurable slowdown for small integers.  
Unfortunately there *is* slowdown for larger integers:  around 20% slowdown for 
100 digit integers and around 70% extra time for 1000 digit integers, though in 
all cases the times are in fractions of a microsecond.  It doesn't help that 
gcc 
doesn't optimize the inner loop perfectly:  with the -O3 flag, the addition and 
subsequent correction are compiled to (with x in eax and v->ob_digit[i] in edx):

        addl    %eax, %edx
        cmpl    %eax, %edx
        adcl    $0, %edx

and the cmpl here is entirely redundant.  Maybe there's some way of writing the 
C 
code that makes it easier for gcc to pick up on this optimization?

_____________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1772851>
_____________________________________

Index: Objects/longobject.c
===================================================================
--- Objects/longobject.c        (revision 58158)
+++ Objects/longobject.c        (working copy)
@@ -1937,10 +1937,18 @@
                i = -(i);
        }
 #define LONG_BIT_SHIFT (8*sizeof(long) - SHIFT)
+       /* The following loop produces a C long x such that (unsigned long)x
+          is congruent to the absolute value of v modulo ULONG_MAX.  The
+          resulting x is nonzero if and only if v is. */
        while (--i >= 0) {
                /* Force a native long #-bits (32 or 64) circular shift */
                x = ((x << SHIFT) & ~MASK) | ((x >> LONG_BIT_SHIFT) & MASK);
                x += v->ob_digit[i];
+               /* If the addition above overflowed (thinking of x as
+                  unsigned), we compensate by incrementing.  This preserves
+                  the value modulo ULONG_MAX. */
+               if ((unsigned long)x < v->ob_digit[i])
+                       x++;
        }
 #undef LONG_BIT_SHIFT
        x = x * sign;
Index: Lib/test/test_hash.py
===================================================================
--- Lib/test/test_hash.py       (revision 58158)
+++ Lib/test/test_hash.py       (working copy)
@@ -18,10 +18,21 @@
 
     def test_numeric_literals(self):
         self.same_hash(1, 1L, 1.0, 1.0+0.0j)
+        self.same_hash(0, 0L, 0.0, 0.0+0.0j)
+        self.same_hash(-1, -1L, -1.0, -1.0+0.0j)
+        self.same_hash(-2, -2L, -2.0, -2.0+0.0j)
 
     def test_coerced_integers(self):
         self.same_hash(int(1), long(1), float(1), complex(1),
                        int('1'), float('1.0'))
+        self.same_hash(int(-2**31), long(-2**31), float(-2**31))
+        self.same_hash(int(1-2**31), long(1-2**31), float(1-2**31))
+        self.same_hash(int(2**31-1), long(2**31-1), float(2**31-1))
+        # for 64-bit platforms
+        self.same_hash(int(2**31), long(2**31), float(2**31))
+        self.same_hash(int(-2**63), long(-2**63), float(-2**63))
+        self.same_hash(int(1-2**63), long(1-2**63))
+        self.same_hash(int(2**63-1), long(2**63-1))
 
     def test_coerced_floats(self):
         self.same_hash(long(1.23e300), float(1.23e300))

_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1772851] Decimal and long hash, compatibly and efficiently

Reply via email to