[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-13 Thread realead


realead  added the comment:

@mark.dickinson

> ...my expectation was that there would be few cases of breakage, and that for 
> those few cases it shouldn't be difficult to fix the breakage.

This expectation is probably correct.

My issue is somewhat only partly on-topic here: If one wants to have all NaNs 
in one equivalency class (e.g. if used as a key-value for example in pandas) it 
is almost impossible to do so in a consistent way without taking a performance 
hit. It seems to be possible builtin-types (even if frozenset won't be pretty), 
but already something like


class A:
def __init__(self, a):
self.a=a
def __hash__(self):
return hash(self.a)
def __eq__(self, other):
return self.a == other.a

is not easy to handle.

A special comparator for containers would be an ultimative solution, but I see 
how this could be too much hassle for a corner case.

--

___
Python tracker 
<https://bugs.python.org/issue43475>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43475] Worst-case behaviour of hash collision with float NaN

2021-06-11 Thread realead


realead  added the comment:

This change makes life harder for people trying to get sane behavior with sets 
for float-, complex- and tuple-objects containing NaNs. With "sane" meaning, 
that 

set([nan, nan, nan]) => {nan}

Until now, one has only to override the equal-comparison for these types but 
now also hash-function needs to be overriden (which is more complicated e.g. 
for tuple).



On a more abstract level: there are multiple possible equivalence relations. 

The one used right now for floats is Py_EQ and results in 
float("nan)!=float("nan") due to IEEE 754.

In hash-set/dict we would like to use an equivalence relation for which all 
NaNs would be in the same equivalence class. Maybe a new comparator is called 
for (like PY_EQ_FOR_HASH_COLLECTION), which would yield float("nan") equivalent 
to float("nan") and which should be used in hash-set/dict?

--
nosy: +realead

___
Python tracker 
<https://bugs.python.org/issue43475>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com