New submission from Brandt Bucher <brandtbuc...@gmail.com>:
Sorting sequences containing NaN values produces an incompletely sorted result. Further, because of the complexity of the timsort, this incomplete sort often silently produces unintuitive, unstable-seeming results that are extremely sensitive to the ordering of the inputs: >>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0]) [1, 2, 2.0, 2.0, 3, nan, 2] >>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0])) [1, 2.0, 2, 2.0, nan, 2, 3] The patch I have provided addresses these issues, including for lists containing nested lists/tuples with NaN values. Specifically, it stably sorts NaNs to the end of the list with no changes to the timsort itself (just the element-wise comparison functions): >>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0]) [1, 2, 2.0, 2, 2.0, 3, nan] >>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]]) [[1], [2], [2.0], [2], [2.0], [3], [nan]] It also includes a new regression test for this behavior. Some other benefits to this patch: * These changes generally result in a sorting performance improvement across data types. The largest increases here are for nested lists, since we add a new unsafe_list_compare function. Other speed increases are due to safe_object_compare's delegation to unsafe comparison functions for objects of the same type. Specifically, the speed impact (positive is faster, negative is slower) is between: * -3% and +3% (10 elements, no PGO) * 0% and +4% (10 elements, PGO) * 0% and +9% (1000 elements, no PGO) * -1% and +9% (1000 elements, PGO) * The current weird NaN-sorting behavior is not documented, so this is not a breaking change. * IEEE754 compliance is maintained. The result is still a stable (arguably, more stable), nondecreasing ordering of the original list. ---------- components: Interpreter Core messages: 336401 nosy: brandtbucher priority: normal severity: normal status: open title: Better NaN sorting. type: behavior versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue36095> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com