[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341526#comment-17341526
 ] 

Kirill Lykov edited comment on ARROW-10899 at 5/9/21, 2:56 PM:
---------------------------------------------------------------

Do we have some information about distribution of integers? And also are they 
signed/unsigned?

I've did some more benchmarking and plots 
[https://github.com/KirillLykov/int-sort-bmk.|https://github.com/KirillLykov/int-sort-bmk]
 In short, radix sort is almost always performs better than std::stable_sort.
 But there are some cases when stable_sort is better – sorted sequence, all 
elements are equal.

Another observation is that it  looks like LSD implementation is faster than 
MSD implementation. 
 Yet it is possible to have almost the same performance using MSD/LSD hybrid 
algorithm if we think that MSD is better for memory bandwidth.

The question is how to measure memory bandwidth consumption, I employed 
LLC-cache-misses events. 


was (Author: klykov):
Do we have some information about distribution of integers? And also are they 
signed/unsigned?



I've did some more benchmarking and plots [int-sort-bmks 
repo|[https://github.com/KirillLykov/int-sort-bmk].]
In short, radix sort is almost always performs better than std::stable_sort.
But there are some cases when stable_sort is better – sorted sequence, all 
elements are equal.

Another observation is that it  looks like LSD implementation is faster than 
MSD implementation. 
Yet it is possible to have almost the same performance using MSD/LSD hybrid 
algorithm if we think that MSD is better for memory bandwidth.

The question is how to measure memory bandwidth consumption, I employed 
LLC-cache-misses events. 

> [C++] Investigate radix sort for integer arrays
> -----------------------------------------------
>
>                 Key: ARROW-10899
>                 URL: https://issues.apache.org/jira/browse/ARROW-10899
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++
>            Reporter: Antoine Pitrou
>            Assignee: Kirill Lykov
>            Priority: Major
>         Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to