[issue29410] Moving to SipHash-1-3

2021-10-10 Thread Inada Naoki
Change by Inada Naoki : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___ ___

[issue29410] Moving to SipHash-1-3

2021-10-10 Thread Inada Naoki
Inada Naoki added the comment: New changeset ad970e8623523a8656e8c1ff4e1dff3423498a5a by Inada Naoki in branch 'main': bpo-29410: Change the default hash algorithm to SipHash13. (GH-28752) https://github.com/python/cpython/commit/ad970e8623523a8656e8c1ff4e1dff3423498a5a --

[issue29410] Moving to SipHash-1-3

2021-10-09 Thread Christian Heimes
Christian Heimes added the comment: > But do they use them as dict keys? AFAIK strings aren't hashed until hash() > is called on them. That's correct. The hash of str and bytes is calculated on demand and then cached. Frozensets also cache their hash values while tuples don't have a cache.

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Guido van Rossum
Guido van Rossum added the comment: Victor: > I expect even more interesting speedup with bytes string longer than 6k > bytes. And I'm quite sure that it's common that people manipulates long > strings in Python :-) But do they use them as dict keys? AFAIK strings aren't hashed until hash()

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Do Rust and Ruby cache the hash of the string in the string object? If they do not then the hashing speed is more important to them. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: I suggest that you write a new PEP that discusses possible improvement for str and bytes hashing. This BPO is just about SipHash-1-3. As the author and implementer of PEP 456, I only approve SipHash-1-3 as new default algorithm. All other change proposals

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.10.2021 15:29, Christian Heimes wrote: > > Christian Heimes added the comment: > > JP got back to me > > On 07/10/2021 14.34, Jean-Philippe Aumasson wrote: >> xxHash is much faster indeed, but collisions seem trivial to find, which >> might

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: JP got back to me On 07/10/2021 14.34, Jean-Philippe Aumasson wrote: > xxHash is much faster indeed, but collisions seem trivial to find, which > might allow hash-flood DoS again (see for example > https://github.com/Cyan4973/xxHash/issues/180 >

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: SipHash is a cryptographically secure PRF, not a cryptographic hashing algorithm, https://www.python.org/dev/peps/pep-0456/#siphash I'm strongly against using a different algorithm when the rest of the world is using either SipHash-2-4 or SipHash-1-3. A

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.10.2021 12:48, Christian Heimes wrote: > >> I don't quite follow. Why is it fine that you discuss DoS, but it's not > fine when others discuss DoS ? > > But this BPO is not about discussing mitigations against DoS attacks in > general. It's about

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: BTW: We already use (a slight variant of) xxHash for tuples: https://bugs.python.org/issue34751 The issues is an interesting read, in particular on how xxHash was eventually chosen, with a whole set of other hash algorithms in between. --

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: I have contacted JP Aumasson to get his feedback on this proposal. -- ___ Python tracker ___

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: > I don't quite follow. Why is it fine that you discuss DoS, but it's not fine when others discuss DoS ? But this BPO is not about discussing mitigations against DoS attacks in general. It's about adding SipHash1-3- and following the example of Rust and

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.10.2021 12:16, Christian Heimes wrote: > >> That's certainly true, but at the same time, just focusing on string > hashes only doesn't really help either, e.g. it is very easy to > create a DoS with numeric keys or other objects which use trivial >

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: > I am not sure its worth enough. Adding algorithm increase some maintenance > cost... Is it really make someone happy? siphash13 is a short function. I wrote and designed PEP 456 so we can easily add new hashing functions. IMHO the maintenance cost is

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: > That's certainly true, but at the same time, just focusing on string hashes only doesn't really help either, e.g. it is very easy to create a DoS with numeric keys or other objects which use trivial hashing algorithms. Marc-Andre, Victor, your postings

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 07.10.2021 11:49, Inada Naoki wrote: > Hash DoS is not only for HTTP headers. Everywhere creating dict from > untrusted source can be attack vector. > For example, many API servers receive JSON as HTTP request body. Limiting > HTTP header don't

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Inada Naoki
Inada Naoki added the comment: > I recommend that you add SipHash-1-3 as an additional algorithm and make it > the default. The removal of --with-hash-algorithm=siphash24 should go through > regular deprecation cycle of two Python versions. I am not sure its worth enough. Adding algorithm

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Since the days this was discussed, a lot of new and faster hash algorithms have been developed. It may be worthwhile looking at those instead. E.g. xxHash is a lot more performant than siphash: https://github.com/Cyan4973/xxHash (the link also has a

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Inada Naoki
Inada Naoki added the comment: > I know that it's not a popular opinion, but I don't think that this denial of > service (DoS) is important. IMO there are enough other ways to crash a > server. Moreover, the initial attack vector was a HTTP request with tons of > header lines. In the

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: I recommend that you add SipHash-1-3 as an additional algorithm and make it the default. The removal of --with-hash-algorithm=siphash24 should go through regular deprecation cycle of two Python versions. -- ___

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Christian Heimes
Christian Heimes added the comment: I support this change, too. SipHash-1-3 is more than good enough for our use case. It provides sufficient diffusion of str and bytes hashes and has sufficient reliance against timing attacks on the PRF key. Also 64bit platforms are less affected less by

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread STINNER Victor
STINNER Victor added the comment: "1.67 us +- 0.03 us: 1.78x faster" with a bytes string of 6k bytes sounds worth it to me. When we talk about "security" here, we are talking about a denial of service attack on the dict worst case performance:

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Mark Shannon
Mark Shannon added the comment: Yes, this is worth doing, IMO. It adds no more code and probably reduces maintenance costs as any improvements/bug-fixes to the rust/ruby versions can be easily ported. Even if the benefit is small, the cost is basically zero. -- nosy: +Mark.Shannon

[issue29410] Moving to SipHash-1-3

2021-10-07 Thread Inada Naoki
Inada Naoki added the comment: I am not sure this is worth doing. Microbenchmarks: ## import time ``` $ main/opt/bin/pyperf command main/opt/bin/python3 -c 'import typing,asyncio' . command: Mean +- std dev: 49.6 ms +- 0.1 ms $ siphash13/opt/bin/pyperf command

[issue29410] Moving to SipHash-1-3

2021-10-06 Thread Guido van Rossum
Guido van Rossum added the comment: Worth revisiting? -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue29410] Moving to SipHash-1-3

2021-10-06 Thread Guido van Rossum
Change by Guido van Rossum : -- nosy: +gvanrossum ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue29410] Moving to SipHash-1-3

2021-10-05 Thread Inada Naoki
Change by Inada Naoki : -- pull_requests: +27097 stage: -> patch review pull_request: https://github.com/python/cpython/pull/28752 ___ Python tracker ___

[issue29410] Moving to SipHash-1-3

2021-10-05 Thread Erlend E. Aasland
Change by Erlend E. Aasland : -- nosy: +erlendaasland ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue29410] Moving to SipHash-1-3

2017-03-06 Thread INADA Naoki
INADA Naoki added the comment: microbench result: https://gist.github.com/methane/33c7b01c45ce23b67246f5ddaff9c9e7 Not significant ~ very little difference for small data. For 4KB: Median +- std dev: [python.default] 3.26 us +- 0.03 us -> [python.siphash13] 2.03 us +- 0.02 us: 1.60x faster

[issue29410] Moving to SipHash-1-3

2017-03-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: And needed microbenchmarks for different sizes, because the result should be relied on fitting the data in caches of different levels. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-03-01 Thread STINNER Victor
STINNER Victor added the comment: This issue is an optimization, but I still don't see any significant speedup :-) Would it be possible to write at least one microbenchmark showing a speedup? Maybe hash()? -- nosy: +haypo ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-02-13 Thread Christian Heimes
Christian Heimes added the comment: The old repos has a different name, https://github.com/tiran/cpython-mirror I'm busy with work at the moment. I'll get to the patch later. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-02-13 Thread INADA Naoki
INADA Naoki added the comment: Christian Heimes: Could you create pull request? https://github.com/tiran/cpython/tree/siphash13 is 404 now. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: OK, let's forget Py_HASH_CUTOFF for now and focus on SipHash-13. -- ___ Python tracker ___

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Christian Heimes
Christian Heimes added the comment: Unless somebody is able to show a real-world example with a considerable performance boost (>5%), I'm strictly against any kind of special casing for short strings. I don't want to trade security for performance. You might be able to persuade me to go for a

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/ Murmur may be safe for <16 byte too. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: Py_HASH_CUTOFF uses different secret from siphash already. The problem is the secret doesn't affects to collision at all. Attacker can produce large number of collision, without knowing the secret. BTW, we have FNV already. Let's survey about FNV-1 short string

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Christian Heimes
Christian Heimes added the comment: We can and should still use hash randomization for short strings, but a different key than for hash randomization for longer strings. -- ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 01.02.2017 13:03, INADA Naoki wrote: > Maybe, we should remove Py_HASH_CUTOFF completely? I think we ought to look for a better hash algorithm for short strings, e.g. a CRC based one. Some interesting investigations on this:

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Christian Heimes
Christian Heimes added the comment: Py_HASH_CUTOFF was an experimental feature for low-performance platforms. On modern hardware the performance increase is marginal to non-existing. I planned to deprecate the feature in 3.6 but forgot. We can deprecate it now and remove it in either 3.7 or

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: Current Py_HASH_CUTOFF implementation seems weak. ``` switch(len) { /* ((hash << 5) + hash) + *p == hash * 33 + *p */ case 7: hash = ((hash << 5) + hash) + *p++; /* fallthrough */ ... case 1: hash = ((hash << 5) + hash) +

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 01.02.2017 11:07, Serhiy Storchaka wrote: > > The performance of hash algorithm shouldn't affect general benchmarks since > hash value is cached inside string object. Almost all dict lookups in > critical parts are lookups with interned strings. But in

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 01.02.2017 10:50, INADA Naoki wrote: > >> it seems as if it would make sense to not use a fixed >> hash algorithm for all strings lengths, but instead a >> hybrid one to increase performance for short strings >> (which are used a lot in Python). >> >> Is

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: The performance of hash algorithm shouldn't affect general benchmarks since hash value is cached inside string object. Almost all dict lookups in critical parts are lookups with interned strings. But in corner cases the difference can be measurable. We

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: > it seems as if it would make sense to not use a fixed > hash algorithm for all strings lengths, but instead a > hybrid one to increase performance for short strings > (which are used a lot in Python). > > Is there a good hash algorithm with provides better >

Re: [issue29410] Moving to SipHash-1-3

2017-02-01 Thread M.-A. Lemburg
On 01.02.2017 10:14, Christian Heimes wrote: > > PEP 456 defines an API to add more hashing algorithms and make the selection > of hash algorithm a compile time option. We can easily add SipHash-1-3 and > make it the default algorithm. Vendors then can select between FNV2, > SipHash-1-3 and

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Christian Heimes
Christian Heimes added the comment: I added SipHash13 as additional hash algorithm in https://github.com/tiran/cpython/tree/siphash13 . Still need to verify the finalizer. For hashlib I'd need to move to a different implementation of SipHash. The implementation in pyhash.c is optimized for

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: > PEP 456 defines an API to add more hashing algorithms and make the selection > of hash algorithm a compile time option. We can easily add SipHash-1-3 and > make it the default algorithm. Vendors then can select between FNV2, > SipHash-1-3 and SipHash-2-4. OK,

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: $ ../python.default -m perf compare_to default.json patched4.json -G Slower (2): - scimark_sor: 479 ms +- 8 ms -> 485 ms +- 9 ms: 1.01x slower (+1%) - genshi_xml: 196 ms +- 2 ms -> 196 ms +- 2 ms: 1.00x slower (+0%) Faster (19): - json_loads: 62.7 us +- 0.5 us ->

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread Christian Heimes
Christian Heimes added the comment: PEP 456 defines an API to add more hashing algorithms and make the selection of hash algorithm a compile time option. We can easily add SipHash-1-3 and make it the default algorithm. Vendors then can select between FNV2, SipHash-1-3 and SipHash-2-4. On

[issue29410] Moving to SipHash-1-3

2017-02-01 Thread INADA Naoki
INADA Naoki added the comment: I'm running pyperformance. BTW, hashlib doesn't provide siphash24. So can we simply replace siphash24 with siphash13, like ruby? https://github.com/ruby/ruby/commit/04c94f95d1a1c6a12f5412228a2bcdc00f5de3b2 -- ___

[issue29410] Moving to SipHash-1-3

2017-01-31 Thread Raymond Hettinger
Raymond Hettinger added the comment: +1 The discussions on the Rust and Ruby lists seem persuasive. -- assignee: -> christian.heimes nosy: +christian.heimes, rhettinger ___ Python tracker

[issue29410] Moving to SipHash-1-3

2017-01-31 Thread INADA Naoki
Changes by INADA Naoki : -- components: +Interpreter Core type: -> performance ___ Python tracker ___

[issue29410] Moving to SipHash-1-3

2017-01-31 Thread INADA Naoki
New submission from INADA Naoki: Rust and Ruby moved from SipHash-2-4 to SipHash-1-3. https://github.com/rust-lang/rust/issues/29754 https://bugs.ruby-lang.org/issues/13017 SipHash-1-3 seems faster than 2-4 and secure enough. -- messages: 286590 nosy: inada.naoki priority: normal