[ 
https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217337#comment-17217337
 ] 

Yibo Cai commented on ARROW-10345:
----------------------------------

Numpy uses a special compare function to treat NaN as largest floating point 
number.
https://github.com/numpy/numpy/blob/578f4e7dca4701637284c782d8c74c0d5b688341/numpy/core/src/npysort/npysort_common.h#L123

Maybe it's better to partition NaN to end of array before sorting, just like 
how Nulls are handled currently.

> [C++] NaN breaks sorting
> ------------------------
>
>                 Key: ARROW-10345
>                 URL: https://issues.apache.org/jira/browse/ARROW-10345
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 2.0.0
>            Reporter: Antoine Pitrou
>            Priority: Major
>             Fix For: 3.0.0
>
>
> {code:python}
> >>> import numpy as np
> >>> import pyarrow.compute as pc
> >>> pc.sort_indices([3.0, 4.0, 1.0, 2.0, None])
> <pyarrow.lib.UInt64Array object at 0x7f78368a0c90>
> [
>   2,
>   3,
>   0,
>   1,
>   4
> ]
> >>> pc.sort_indices([3.0, 4.0, np.nan, 1.0, 2.0, None])
> <pyarrow.lib.UInt64Array object at 0x7f783684bf30>
> [
>   0,
>   1,
>   2,
>   3,
>   4,
>   5
> ]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to