Mark Dickinson
added the comment:
Here's a revised patch, that only touches longobject.c and does the minimum
necessary to alter long.__hash__ from being *almost* completely predictable to
being completely predictable. To summarize the effects of this patch:
- the current long.__hash__ is predictable except for a very small number of
(unpredictable) inputs; this patch makes it predictable for all inputs,
thereby
making it possible to define an efficient Decimal.__hash__. To be precise, the
patched long.__hash__ has the property that it's periodic of period ULONG_MAX
in
the ranges [1, infinity) and (-infinity, -1].
- the value of hash(n) is unchanged for any n that's small enough to be
representable as an int, and also unchanged for the vast majority of long
integers n of reasonable size. (For integers up to 450 digits long, the new
hash
value differs from the old for around 1 in every 2500 integers on a 32-bit
system; on a 64-bit system the figure is 1 in every 10000 billion.)
- On my test system (Dual Xeon 2.8GHz/SuSE Linux 10.2/gcc 4.1), using
timeit.timeit('hash(n)') there's no measurable slowdown for small integers.
Unfortunately there *is* slowdown for larger integers: around 20% slowdown for
100 digit integers and around 70% extra time for 1000 digit integers, though in
all cases the times are in fractions of a microsecond. It doesn't help that
gcc
doesn't optimize the inner loop perfectly: with the -O3 flag, the addition and
subsequent correction are compiled to (with x in eax and v->ob_digit[i] in edx):
addl %eax, %edx
cmpl %eax, %edx
adcl $0, %edx
and the cmpl here is entirely redundant. Maybe there's some way of writing the
C
code that makes it easier for gcc to pick up on this optimization?
_____________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1772851>
_____________________________________
Index: Objects/longobject.c
===================================================================
--- Objects/longobject.c (revision 58158)
+++ Objects/longobject.c (working copy)
@@ -1937,10 +1937,18 @@
i = -(i);
}
#define LONG_BIT_SHIFT (8*sizeof(long) - SHIFT)
+ /* The following loop produces a C long x such that (unsigned long)x
+ is congruent to the absolute value of v modulo ULONG_MAX. The
+ resulting x is nonzero if and only if v is. */
while (--i >= 0) {
/* Force a native long #-bits (32 or 64) circular shift */
x = ((x << SHIFT) & ~MASK) | ((x >> LONG_BIT_SHIFT) & MASK);
x += v->ob_digit[i];
+ /* If the addition above overflowed (thinking of x as
+ unsigned), we compensate by incrementing. This preserves
+ the value modulo ULONG_MAX. */
+ if ((unsigned long)x < v->ob_digit[i])
+ x++;
}
#undef LONG_BIT_SHIFT
x = x * sign;
Index: Lib/test/test_hash.py
===================================================================
--- Lib/test/test_hash.py (revision 58158)
+++ Lib/test/test_hash.py (working copy)
@@ -18,10 +18,21 @@
def test_numeric_literals(self):
self.same_hash(1, 1L, 1.0, 1.0+0.0j)
+ self.same_hash(0, 0L, 0.0, 0.0+0.0j)
+ self.same_hash(-1, -1L, -1.0, -1.0+0.0j)
+ self.same_hash(-2, -2L, -2.0, -2.0+0.0j)
def test_coerced_integers(self):
self.same_hash(int(1), long(1), float(1), complex(1),
int('1'), float('1.0'))
+ self.same_hash(int(-2**31), long(-2**31), float(-2**31))
+ self.same_hash(int(1-2**31), long(1-2**31), float(1-2**31))
+ self.same_hash(int(2**31-1), long(2**31-1), float(2**31-1))
+ # for 64-bit platforms
+ self.same_hash(int(2**31), long(2**31), float(2**31))
+ self.same_hash(int(-2**63), long(-2**63), float(-2**63))
+ self.same_hash(int(1-2**63), long(1-2**63))
+ self.same_hash(int(2**63-1), long(2**63-1))
def test_coerced_floats(self):
self.same_hash(long(1.23e300), float(1.23e300))
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com