cyb70289 commented on pull request #8524: URL: https://github.com/apache/arrow/pull/8524#issuecomment-724403787
> Is "sort_indices" actually stable? The [documentation](https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions) says "non-stable". In current code, "sort_indices" is stable, it uses "std::stable_sort". "partition_nth_indices" is not stable, it uses "std::nth_element". [numpy.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) has a parameter to select sorting algorithm, maybe arrow can also implement unstable quicksort (std::sort) and provide a kernel option to select. ``` # comparison of numpy stable and unstable sort # quicksort gives better performance In [1]: import numpy as np In [2]: a = np.random.randint(0, 100, 12345678) In [3]: time i1 = np.argsort(a, kind='quicksort') CPU times: user 909 ms, sys: 4.67 ms, total: 914 ms Wall time: 913 ms In [4]: time i2 = np.argsort(a, kind='stable') CPU times: user 998 ms, sys: 20.8 ms, total: 1.02 s Wall time: 1.02 s # stable and quick sort gives different result In [5]: (i1 == i2).all() Out[5]: False ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org