New submission from Brandt Bucher <[email protected]>:
Sorting sequences containing NaN values produces an incompletely sorted result.
Further, because of the complexity of the timsort, this incomplete sort often
silently produces unintuitive, unstable-seeming results that are extremely
sensitive to the ordering of the inputs:
>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]
The patch I have provided addresses these issues, including for lists
containing nested lists/tuples with NaN values. Specifically, it stably sorts
NaNs to the end of the list with no changes to the timsort itself (just the
element-wise comparison functions):
>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]
It also includes a new regression test for this behavior.
Some other benefits to this patch:
* These changes generally result in a sorting performance improvement across
data types. The largest increases here are for nested lists, since we add a new
unsafe_list_compare function. Other speed increases are due to
safe_object_compare's delegation to unsafe comparison functions for objects of
the same type. Specifically, the speed impact (positive is faster, negative is
slower) is between:
* -3% and +3% (10 elements, no PGO)
* 0% and +4% (10 elements, PGO)
* 0% and +9% (1000 elements, no PGO)
* -1% and +9% (1000 elements, PGO)
* The current weird NaN-sorting behavior is not documented, so this is not a
breaking change.
* IEEE754 compliance is maintained. The result is still a stable (arguably,
more stable), nondecreasing ordering of the original list.
----------
components: Interpreter Core
messages: 336401
nosy: brandtbucher
priority: normal
severity: normal
status: open
title: Better NaN sorting.
type: behavior
versions: Python 3.8
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue36095>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com