[ https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17314850#comment-17314850 ]
Kirill Lykov commented on ARROW-10899: -------------------------------------- I added a repository to put there some experiments i've done for the earlier added plots: [https://github.com/KirillLykov/int-sort-bmk] Unfortunately, I couldn't achieve a fast progress on this ticket and since it is not my main activity I decided to freeze it on my side. By fast progress I mean delivering a stable non-comparison-based sorting algorithm which is faster than std::stable_sort. Naiveradix sort which is implemented there is much slower on int64_t as one might find by looking into the plots in scripts/imgs. The last thing that I was trying to do is to modify boost's integer_sort to make it stable (as unstable version is really fast). To simplify experiments with integer_sort I've extracted it in one separate file called [https://github.com/KirillLykov/int-sort-bmk/blob/master/src/boost_spread_sort.h] One can find that integer_sort sometimes relies on pdqsort. This can be replaced with stable_sort. A more interesting part of the code which I think makes integer_sort unstable is [https://github.com/KirillLykov/int-sort-bmk/blob/master/src/boost_spread_sort.h#L210] I think in-place version can be replaced with non-in-place. > [C++] Investigate radix sort for integer arrays > ----------------------------------------------- > > Key: ARROW-10899 > URL: https://issues.apache.org/jira/browse/ARROW-10899 > Project: Apache Arrow > Issue Type: Wish > Components: C++ > Reporter: Antoine Pitrou > Assignee: Kirill Lykov > Priority: Major > Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot > 2021-02-10 at 10.58.23.png > > > For integer arrays with a non-tiny range of values, we currently use a stable > sort. It may be faster to use a radix sort instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)