Just some comments.
> the first time time with a bit shift of 7
Double "time".
> with a 128bit seed and 64-bit output
Inconsistancy with hyphen. There are same issues in other places.
> bytes_hash provides the tp_hash slot function for unicode.
Typo. Should be "unicode_hash".
> len = PyUnicode_GET_LENGTH(self);
> switch (PyUnicode_KIND(self)) {
> case PyUnicode_1BYTE_KIND: {
> const Py_UCS1 *c = PyUnicode_1BYTE_DATA(self);
> x = _PyHash_Func->hashfunc(c, len * sizeof(Py_UCS1));
> break;
> }
> case PyUnicode_2BYTE_KIND: {
...
x = _PyHash_Func->hashfunc(PyUnicode_BYTE_DATA(self),
PyUnicode_GET_LENGTH(self) * PyUnicode_KIND(self));
> Equal hash values result in a hash collision and therefore cause a
minor speed penalty for dicts and sets with mixed keys. The cause of the
collision could be removed
I doubt about this. If one collects bytes and strings in one dictionary,
this equality will only double the number of collisions (for DoS attack
we need increase it by thousands and millions times). So it doesn't
matter. On the other hand, I one deliberately uses bytes and str
subclasses with overridden equality, same hash for ASCII bytes and
strings can be needed.
> For very short strings the setup costs for SipHash dominates its
speed but it is still in the same order of magnitude as the current FNV
code.
We could use other algorithm for very short strings if it makes matter.
> The summarized total runtime of the benchmark is within 1% of the
runtime of an unmodified Python 3.4 binary.
What about deviations of individual tests?
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com