Chris Angelico <ros...@gmail.com>: > On Fri, Aug 11, 2017 at 7:17 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: >> That's interesting, but suggests there's something weird (~ suboptimal) >> going on with CPython's scrambling algorithm. Also, your performance >> test might yield completely different results on other Python >> implementations. >> >> Apart from uniqueness, there's no particular reason to prefer one >> __hash__() value over another as long as the interesting bits are inside >> the CPU's simple integer range. > > Not true. Every time you probe a new location [1], you have to fetch > more data from RAM. That delays you significantly. An ideal hashing > system is going to give a high probability of giving you an empty > bucket on the first try, to minimize the number of main memory > accesses required. > > CPython's scrambling algorithm means that, even when its first try > doesn't succeed, there's a good chance that its second will succeed. > But that doesn't change the fact that you want the first one to > succeed as often as possible.
What does all that have to do with where the unique bits are in the hash value? 0x10000000 0x20000000 0x30000000 0x40000000 should be no worse as __hash__() values than 0x1000 0x2000 0x3000 0x4000 or 1 2 3 4 Of course, some algorithms can (and, we have learned, do) prefer some bits over others, but that's inside the implementation black box. I would think every bit should carry an approximately equal weight. Marko -- https://mail.python.org/mailman/listinfo/python-list