[issue14621] Hash function is not randomized properly

Christian Heimes Wed, 17 Oct 2012 09:57:18 -0700

Christian Heimes added the comment:

I've modified unicodeobject's unicode_hash() function. V8's algorithm is about 
55% slower for a 800 MB ASCII string on my box.


Python's current hash algorithm for bytes and unicode:

   while (--len >= 0)
        x = (_PyHASH_MULTIPLIER * x) ^ (Py_uhash_t) *P++;

$ ./python -m timeit -s "t = 'abcdefgh' * int(1E8)" "hash(t)"
10 loops, best of 3: 94.1 msec per loop


V8's algorithm:

    while (--len >= 0) {
        x += (Py_uhash_t) *P++;
        x += ((x + (Py_uhash_t)len) << 10);
        x ^= (x >> 6);
    }

$ ./python -m timeit -s "t = 'abcdefgh' * int(1E8)" "hash(t)"
10 loops, best of 3: 164 msec per loop

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14621>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14621] Hash function is not randomized properly

Reply via email to