[ https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217337#comment-17217337 ]
Yibo Cai commented on ARROW-10345: ---------------------------------- Numpy uses a special compare function to treat NaN as largest floating point number. https://github.com/numpy/numpy/blob/578f4e7dca4701637284c782d8c74c0d5b688341/numpy/core/src/npysort/npysort_common.h#L123 Maybe it's better to partition NaN to end of array before sorting, just like how Nulls are handled currently. > [C++] NaN breaks sorting > ------------------------ > > Key: ARROW-10345 > URL: https://issues.apache.org/jira/browse/ARROW-10345 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 2.0.0 > Reporter: Antoine Pitrou > Priority: Major > Fix For: 3.0.0 > > > {code:python} > >>> import numpy as np > >>> import pyarrow.compute as pc > >>> pc.sort_indices([3.0, 4.0, 1.0, 2.0, None]) > <pyarrow.lib.UInt64Array object at 0x7f78368a0c90> > [ > 2, > 3, > 0, > 1, > 4 > ] > >>> pc.sort_indices([3.0, 4.0, np.nan, 1.0, 2.0, None]) > <pyarrow.lib.UInt64Array object at 0x7f783684bf30> > [ > 0, > 1, > 2, > 3, > 4, > 5 > ] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)