[issue43475] Worst-case behaviour of hash collision with float NaN

2021-11-24 Thread Christoph Groth
Christoph Groth added the comment: > What concrete action would you propose that the Python core devs take at this > point? Nothing for now. I stumbled across this issue through https://gitlab.kwant-project.org/kwant/tinyarray/-/issues/20 and had the impression that the aspect that I

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-11-24 Thread Mark Dickinson
Mark Dickinson added the comment: Just for fun: I gave a somewhat ranty 10-minute talk on this topic at a (virtual) conference a few months ago: https://www.youtube.com/watch?v=01oeosRVwgY -- ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-11-24 Thread Mark Dickinson
Mark Dickinson added the comment: @cwg: Yep, we're aware of this. There are no good solutions here - only a mass of constraints, compromises and trade-offs. I think we're already somewhere on the Pareto boundary of the "best we can do" given the constraints. Moving to another point on the

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-11-24 Thread Christoph Groth
Christoph Groth added the comment: Hello. I would like to point out a possible problem that this change to CPython has introduced. This change looks innocent, but it breaks the unwritten rule that the hash value of a number (or actually any built-in immutable type!) in Python depends only

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-16 Thread Mark Dickinson
Mark Dickinson added the comment: I think this can be closed now. -- stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-15 Thread Mark Dickinson
Mark Dickinson added the comment: New changeset 8d0b2ca493e236fcad8709a622c1ac8ad29c372d by Mark Dickinson in branch '3.10': [3.10] bpo-43475: Add what's new entry for NaN hash changes (GH-26725) (GH-26743) https://github.com/python/cpython/commit/8d0b2ca493e236fcad8709a622c1ac8ad29c372d

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-15 Thread Mark Dickinson
Change by Mark Dickinson : -- pull_requests: +25328 pull_request: https://github.com/python/cpython/pull/26743 ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-15 Thread Mark Dickinson
Mark Dickinson added the comment: New changeset 1d10bf0bb9409a406c56b0de8870df998637fd0f by Mark Dickinson in branch 'main': bpo-43475: Add what's new entry for NaN hash changes (GH-26725) https://github.com/python/cpython/commit/1d10bf0bb9409a406c56b0de8870df998637fd0f --

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-14 Thread Mark Dickinson
Change by Mark Dickinson : -- pull_requests: +25315 pull_request: https://github.com/python/cpython/pull/26725 ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-14 Thread Mark Dickinson
Mark Dickinson added the comment: > Does this change need to be mentioned in What's New? Yes, I think so, given that it's a change to documented behavior. It's also something that third party packages (like NumPy) potentially need to be aware of. --

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread Raymond Hettinger
Raymond Hettinger added the comment: > If one wants to have all NaNs in one equivalency class > (e.g. if used as a key-value for example in pandas) it > is almost impossible to do so in a consistent way > without taking a performance hit. ISTM the performance of the equivalent class case is

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread realead
realead added the comment: @mark.dickinson > ...my expectation was that there would be few cases of breakage, and that for > those few cases it shouldn't be difficult to fix the breakage. This expectation is probably correct. My issue is somewhat only partly on-topic here: If one wants to

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Does this change need to be mentioned in What's New? -- ___ Python tracker ___ ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread miss-islington
miss-islington added the comment: New changeset 128899d8b8d65d86bd9bbea6801e9f36e6f409f2 by Miss Islington (bot) in branch '3.10': bpo-43475: Fix the Python implementation of hash of Decimal NaN (GH-26679) https://github.com/python/cpython/commit/128899d8b8d65d86bd9bbea6801e9f36e6f409f2

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: > Maybe a new comparator is called for (like PY_EQ_FOR_HASH_COLLECTION), which > would yield float("nan") equivalent to float("nan") and which should be used > in hash-set/dict? It is difficult to do from technical point of view because all rich

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread miss-islington
Change by miss-islington : -- nosy: +miss-islington nosy_count: 6.0 -> 7.0 pull_requests: +25292 pull_request: https://github.com/python/cpython/pull/26706 ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread Mark Dickinson
Mark Dickinson added the comment: @Serhiy: can this be closed again? Does GH-26679 need to be backported to the 3.10 branch? -- ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread Mark Dickinson
Mark Dickinson added the comment: @realead > This change makes life harder for people trying to get sane behavior with > sets [...] Yes, code that assumes that all NaNs have the same hash value will need to change. But that doesn't seem unreasonable for objects that are already considered

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-12 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 9f1c5f6e8af6ba3f659b2aea1e221ac9695828ba by Serhiy Storchaka in branch 'main': bpo-43475: Fix the Python implementation of hash of Decimal NaN (GH-26679) https://github.com/python/cpython/commit/9f1c5f6e8af6ba3f659b2aea1e221ac9695828ba

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-11 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: +25265 stage: -> patch review pull_request: https://github.com/python/cpython/pull/26679 ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: There is an error in the Python implementation of Decimal.__hash__. It calls super().__hash__(), but the C implementation calls object.__hash__(). Also, the documentation for floating point hash has the same error. -- stage: resolved -> status:

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-11 Thread realead
realead added the comment: This change makes life harder for people trying to get sane behavior with sets for float-, complex- and tuple-objects containing NaNs. With "sane" meaning, that set([nan, nan, nan]) => {nan} Until now, one has only to override the equal-comparison for these

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-22 Thread Mark Dickinson
Mark Dickinson added the comment: Opened https://github.com/numpy/numpy/issues/18833 -- ___ Python tracker ___ ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-22 Thread Mark Dickinson
Mark Dickinson added the comment: Thanks, Raymond. I'll open something on the NumPy bug tracker, too, since they may want to consider doing something similar for numpy.float64 and friends. -- ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-22 Thread Raymond Hettinger
Change by Raymond Hettinger : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-22 Thread Raymond Hettinger
Raymond Hettinger added the comment: New changeset a07da09ad5bd7d234ccd084a3a0933c290d1b592 by Raymond Hettinger in branch 'master': bpo-43475: Fix worst case collision behavior for NaN instances (GH-25493) https://github.com/python/cpython/commit/a07da09ad5bd7d234ccd084a3a0933c290d1b592

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-20 Thread Raymond Hettinger
Change by Raymond Hettinger : -- keywords: +patch pull_requests: +24216 stage: -> patch review pull_request: https://github.com/python/cpython/pull/25493 ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I'm loathe to guarantee anything about this in the language itself. There aren't language any guarantees being proposed. Letting the hash depend on the object id just helps avoid quadratic behavior. Making float('NaN') a singleton is also perfectly

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Tim Peters
Tim Peters added the comment: I agree hashing a NaN acting like the generic object hash (return rotated address) is a fine workaround, although I'm not convinced it addresses a real-world problem ;-) But why not? It might. But that's for CPython. I'm loathe to guarantee anything about this

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Raymond Hettinger
Change by Raymond Hettinger : -- assignee: -> rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Mark Dickinson
Mark Dickinson added the comment: > Why not have hash() return the id() like we do for instances of object? I think that's fine, and IIUC that's what Cong Ma was proposing. It seems like the least invasive potential fix. In principle I find the idea of making NaN a singleton rather

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-04-10 Thread Raymond Hettinger
Raymond Hettinger added the comment: Mark, is there any reason hash(float('NaN')) and hash(Decimal('NaN')) have to be a constant? Since NaNs don't compare equal, the hash homomorphism has no restrictions. Why not have hash() return the id() like we do for instances of object? I

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-15 Thread Raymond Hettinger
Raymond Hettinger added the comment: > And making float('nan') returning a singleton, > but 1e1000 * 0 returning different NaN would cause large confusion. Not really, it would be just be an implementation detail, no different than int and strings being sometimes unique and sometimes not.

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-15 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: What about Decimal NaN? Even if make float NaN a singleton, there will be other NaNs. And making float('nan') returning a singleton, but 1e1000 * 0 returning different NaN would cause large confusion. -- nosy: +serhiy.storchaka

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: > The performance thoughts were motivated by the idea of > making NaN a singleton: adding a check to > PyFloat_FromDouble would mean that almost every operation > that produced a float would have to pass through that check. It may suffice to move the

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-13 Thread Mark Dickinson
Mark Dickinson added the comment: @Cong Ma: Yes, I'm not worried about performance for the change you're proposing (though as you say, we should benchmark anyway). The performance thoughts were motivated by the idea of making NaN a singleton: adding a check to PyFloat_FromDouble would mean

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-13 Thread Cong Ma
Cong Ma added the comment: > Idea: We could make this problem go away by making NaN a singleton. Apart from the fact that NaN is not a specific value/object (as pointed out in the previous message by @mark.dickinson), currently each builtin singleton (None, True, False, etc.) in Python

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-13 Thread Mark Dickinson
Mark Dickinson added the comment: > signalling NaNs are quietly silenced on my machine I just tested this assertion, and it appears that I was lying: this doesn't seem to be true. I'm almost *sure* it used to be true, and I'm not sure what changed, and when. (FWIW: Python 3.9.2 from

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-13 Thread Mark Dickinson
Mark Dickinson added the comment: > We could make this problem go away by making NaN a singleton. That could work, though we'd annoy anyone who cared about preserving NaN payloads and signs in Python. I don't know if such people exist. Currently the sign is accessible through things like

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: Idea: We could make this problem go away by making NaN a singleton. -- ___ Python tracker ___

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Mark Dickinson
Mark Dickinson added the comment: > I also wonder if there's security implication for servers that process > user-submitted input Yes, the "malicious actor" scenario is another one to consider. But unlike the string hashing attack, I'm not seeing a realistic way for the nan hash collisions

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Cong Ma
Cong Ma added the comment: Sorry, please ignore my rambling about "float() returning aliased object" -- in that case the problem with hashing doesn't arise. -- ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Cong Ma
Cong Ma added the comment: Thank you @mark.dickinson for the detailed analysis. In addition to your hypothetical usage examples, I am also trying to understand the implications for user code. If judging by the issues people open on GitHub like this:

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Mark Dickinson
Mark Dickinson added the comment: On third thoughts, of course it *would* help, because the Counter is keeping references to the realised NaN values. I think I'll go away now and come back when my brain is working. -- ___ Python tracker

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Mark Dickinson
Mark Dickinson added the comment: Hmm. On second thoughts, the proposed solution wouldn't actually *help* with the situation I gave: the elements (lazily) realised from the NumPy array are highly likely to all end up with the same address in RAM. :-( >>> x = np.full(10, np.nan) >>> for v in

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Mark Dickinson
Mark Dickinson added the comment: Sigh. When I'm undisputed ruler of the multiverse, I'm going to make "NaN == NaN" return True, IEEE 754 be damned. NaN != NaN is fine(ish) at the level of numerics; the problems start when the consequences of that choice leak into the other parts of the

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Mark Dickinson
Change by Mark Dickinson : -- nosy: +mark.dickinson ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue43475] Worst-case behaviour of hash collision with float NaN

2021-03-11 Thread Cong Ma
New submission from Cong Ma : Summary: CPython hash all NaN values to 0. This guarantees worst-case behaviour for dict if numerous existing keys are NaN. I think by hashing NaN using the generic object (or "pointer") hash instead, the worst-case situation can be alleviated without changing