[issue43475] Worst-case behaviour of hash collision with float NaN

Raymond Hettinger Sun, 13 Jun 2021 21:40:55 -0700


Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:


> If one wants to have all NaNs in one equivalency class
> (e.g. if used as a key-value for example in pandas) it
> is almost impossible to do so in a consistent way 
> without taking a performance hit.

ISTM the performance of the equivalent class case is far less important than 
the one we were trying to solve.  Given a choice we should prefer helping 
normal unadorned instances rather than giving preference to a subclass that 
redefines the usual behaviors.  

In CPython, it is a fact of life that overriding builtin behaviors with pure 
python code always incurs a performance hit.  Also, in your example, the 
subclass isn't technically correct because it relies on a non-guaranteed 
implementation details.  It likely isn't even the fastest approach.

The only guaranteed behaviors are that math.isnan(x) reliably detects a NaN and 
that x!=x when x is a NaN.  Those are the only assured tools in the uphill 
battle to fight the weird intrinsic nature of NaNs.

So one possible solution is to replace all the NaNs with a canonical 
placeholder value that doesn't have undesired properties:

    {None if isnan(x) else x for x in arr}

That relies on guaranteed behaviors and is reasonably fast.  IMO that beats 
trying to reprogram float('NaN') to behave the opposite of how it was designed.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43475>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue43475] Worst-case behaviour of hash collision with float NaN

Reply via email to