Just some comments.

> the first time time with a bit shift of 7

Double "time".

> with a 128bit seed and 64-bit output

Inconsistancy with hyphen. There are same issues in other places.

> bytes_hash provides the tp_hash slot function for unicode.

Typo. Should be "unicode_hash".

> len = PyUnicode_GET_LENGTH(self);
> switch (PyUnicode_KIND(self)) {
> case PyUnicode_1BYTE_KIND: {
>     const Py_UCS1 *c = PyUnicode_1BYTE_DATA(self);
>     x = _PyHash_Func->hashfunc(c, len * sizeof(Py_UCS1));
>     break;
> }
> case PyUnicode_2BYTE_KIND: {
...

x = _PyHash_Func->hashfunc(PyUnicode_BYTE_DATA(self), PyUnicode_GET_LENGTH(self) * PyUnicode_KIND(self));

> Equal hash values result in a hash collision and therefore cause a minor speed penalty for dicts and sets with mixed keys. The cause of the collision could be removed

I doubt about this. If one collects bytes and strings in one dictionary, this equality will only double the number of collisions (for DoS attack we need increase it by thousands and millions times). So it doesn't matter. On the other hand, I one deliberately uses bytes and str subclasses with overridden equality, same hash for ASCII bytes and strings can be needed.

> For very short strings the setup costs for SipHash dominates its speed but it is still in the same order of magnitude as the current FNV code.

We could use other algorithm for very short strings if it makes matter.

> The summarized total runtime of the benchmark is within 1% of the runtime of an unmodified Python 3.4 binary.

What about deviations of individual tests?


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to