On Sun, 10 Oct 2010 13:49:09 +0000, kj wrote: >>> to give only two of an infinite number of counter-examples. > > (Infinite???)
I was mistaken. Given Arnaud's specification that we look only at the Python 2.x ints, I believe that there is only one counter-example, namely -1. However, in Python 3.x I would be correct. The reasoning is a simple application of the pigeon-hole principle. Since hash(n) is limited to returning a finite number of distinct results, but there are an infinite number of Python 3 ints, it follows that there must be an infinite number of collisions. >>And, in fact, >>(-1) is the only int such that hash(x) != x. > > Arnaud, how did you determine that -1 is the only such int? I can't > imagine any other approach other than a brute-force check of all ints... Reading the source code is also a good approach. I don't read C very well, but even I can work out what this is doing: static long int_hash(PyIntObject *v) { /* XXX If this is changed, you also need to change the way Python's long, float and complex types are hashed. */ long x = v -> ob_ival; if (x == -1) x = -2; return x; } (from intobject.c in Python 2.6.1). It takes a Python int object, extracts the value of it as a C long, returns -2 if the value is -1 otherwise returns the value. > When I tried this I ran into unforeseen limitations in xrange, etc. > It's all very doable, but still, it would take at least about 3 hours on > my laptop. What are you doing? This takes less than 20 seconds to test up to 2**24: import time def test(n): t = time.time() assert hash(0) == 0 assert hash(1) == 1 assert hash(-1) == -2 for i in xrange(2, n): assert hash(i) == i, "failed %d" % i assert hash(-i) == -i, "failed %d" % -i t = time.time() - t return t >>> test(2**24) 18.076412916183472 Since 2**31 is 2**7 = 128 times bigger, I estimate that testing everything up to 2**31 should take less than 45 minutes. And my PC is not exactly the fastest machine on the block. -- Steven -- http://mail.python.org/mailman/listinfo/python-list