[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-06-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360162#comment-17360162
 ] 

Kirill Lykov edited comment on ARROW-10899 at 6/9/21, 3:39 PM:
---

Well, it looks like it is called "wolf sort" 
(https://github.com/scandum/wolfsort).
I compared the implementation of Igor with Travis lsd and stable sort on 
uniformly distributed uint64_t from the range 0..1e9, see   

!uniform_1B_wolf.png! 

I don't know if his implementation is good/bad. 
I just took it and made it a bit more typed to be compiled by C++ compiler, see 
https://github.com/KirillLykov/int-sort-bmk/blob/wolfSortCheck/ branch if 
really interested


was (Author: klykov):
Well, it looks like it is called "wolf sort" 
(https://github.com/scandum/wolfsort).
I compared the implementation of Igor with Travis lsd and stable sort, see   

!uniform_1B_wolf.png! 

I don't know if his implementation is good/bad. I just took it and made it a 
bit more typed to be compiled by C++ compiler, see 
https://github.com/KirillLykov/int-sort-bmk/blob/wolfSortCheck/ branch if 
really interested

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png, differentdistrib.png, 
> uniform_1B_wolf.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-06-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17360115#comment-17360115
 ] 

Kirill Lykov edited comment on ARROW-10899 at 6/9/21, 2:56 PM:
---

LSD sort is stable.
Do you want to try with Travis' LSD sort implementation?
If yes, we can discuss what shall be done. Like I can change the style of the 
code, make it templated, support negative, etc. 

Making perfect adaptive stable generic radix looks like too time consuming task 
for me, but adapting LSD is simpler.

Yet about bandwidth. 
Travis wrote me that he originally wanted to have two posts -- one about LSD 
and another about MSD. And he prefers MSD due to lower bandwidth consumption.
It sounds logical by looking to algorithms themselfs -- LSD will visit all the 
radixes anyway while MSD will build a tree of recursive calls which generally 
speaking is not balanced so makes less memory operations.
I tried to estimate bandwidth by using LLC-misses in perf, results are in the 
table there https://github.com/KirillLykov/int-sort-bmk#msd-vs-lsd-radix-sort
Yet MSD is more complicated to implement and less performant, so if doing 
MSD-like approach I would go with adaptive MSD like one from boost
What do you think about this?



was (Author: klykov):
LSD sort is stable.
Do you want to try with Travis' LSD sort implementation?

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png, differentdistrib.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-06-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359990#comment-17359990
 ] 

Kirill Lykov edited comment on ARROW-10899 at 6/9/21, 11:31 AM:


I have to drop this ticket but if someone will take it, I will be glad to 
assist.
Below is a summary of the research I did.

First of all I couldn't find ready to use stable radix sort implementation 
which is fast and generic.
boost::spread_sort is the best available implementation of in-place radix sort 
because it designed to perform well on all the inputs.
On the contrary, vanilla radix sort will perform poorly on some types of 
distributions as shown on the plot below.
spread_sort performs better for two reasons:
* it detect some cases when radix sort will perform poorly and relies on 
boost's pdqsort sorting algorithm
* it uses adaptive size of the radix rather than fixed

What makes spreadsort unstable is that it sometimes relies on pdqsort which is 
unstable and that it is in-place sort.
Hence, it is possible to make it stable by replacing calls to pdqsort to 
stable_sort and by using additional buffer instead of doing in-place sorting.
Another argument for starting from spread sort code is that it is tested on 
different platforms (so less risk for surprises), it works with different 
datatypes.

!differentdistrib.png|width=350,height=350!

Beside of developing a new efficient implementation of adaptive radix sort, 
there is also work need to be done to integrate this sort. Since right now a 
custom comparator is used with std::stable_sort.

Having in mind also extensive testing/bmk (different distributions, different 
data types) and profiling, I would estimate this task for 80-120h


was (Author: klykov):
I have to drop this ticket but if someone will take it, I will be glad to 
assist.
Below is a summary of the research I did.

First of all I couldn't find ready to use stable radix sort implementation 
which is fast and generic.
boost::spread_sort is the best available implementation of in-place radix sort 
because it designed to perform well on all the inputs.
On the contrary, vanilla radix sort will perform poorly on some types of 
distributions as shown on the plot below.
spread_sort performs better for two reasons:
* it detect some cases when radix sort will perform poorly and relies on 
boost's pdqsort sorting algorithm
* it uses adaptive size of the radix rather than fixed

What makes spreadsort unstable is that it sometimes relies on pdqsort which is 
unstable and that it is in-place sort.
Hence, it is possible to make it stable by replacing calls to pdqsort to 
stable_sort and by using additional buffer instead of doing in-place sorting.

!differentdistrib.png|width=350,height=350!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png, differentdistrib.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-06-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359990#comment-17359990
 ] 

Kirill Lykov edited comment on ARROW-10899 at 6/9/21, 11:24 AM:


I have to drop this ticket but if someone will take it, I will be glad to 
assist.
Below is a summary of the research I did.

First of all I couldn't find ready to use stable radix sort implementation 
which is fast and generic.
boost::spread_sort is the best available implementation of in-place radix sort 
because it designed to perform well on all the inputs.
On the contrary, vanilla radix sort will perform poorly on some types of 
distributions as shown on the plot below.
spread_sort performs better for two reasons:
* it detect some cases when radix sort will perform poorly and relies on 
boost's pdqsort sorting algorithm
* it uses adaptive size of the radix rather than fixed

What makes spreadsort unstable is that it sometimes relies on pdqsort which is 
unstable and that it is in-place sort.
Hence, it is possible to make it stable by replacing calls to pdqsort to 
stable_sort and by using additional buffer instead of doing in-place sorting.

!differentdistrib.png|width=350,height=350!


was (Author: klykov):
I have to drop this ticket but if someone will take it, I will be glad to 
assist.
Below is a summary of the research I did.

First of all I couldn't find ready to use stable radix sort implementation 
which is fast and generic.
boost::spread_sort is the best available implementation of in-place radix sort 
because it designed to perform well on all the inputs.
On the contrary, vanilla radix sort will perform poorly on some types of 
distributions as shown on the plot below.
spread_sort performs better for two reasons:
* it detect some cases when radix sort will perform poorly and relies on 
boost's pdqsort sorting algorithm
* it uses adaptive size of the radix rather than fixed

What makes spreadsort unstable is that it sometimes relies on pdqsort which is 
unstable and that it is in-place sort.
Hence, it is possible to make it stable by replacing calls to pdqsort to 
stable_sort and by using additional buffer instead of doing in-place sorting.

!differentdistrib.png!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png, differentdistrib.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-05-10 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341925#comment-17341925
 ] 

Kirill Lykov edited comment on ARROW-10899 at 5/10/21, 2:08 PM:


I believe that the LSD implementation by Travis is stable by construction, I 
will recheck it.
 Regarding stable_sort – i'm trying to find a way to rely on stable_sort 
instead of radix sort for particular types of distribution.
These checks are done in boost::spread_sort but I haven't got the math behind 
yet.
In particular, they detect is sequence is sorted and find min/max values.
>From that they can conclude if the sequence is bucket sorted or it is not 
>sorted but better to be handled by another algorithm.


was (Author: klykov):
I believe that the LSD implementation by Travis is stable by construction, I 
will recheck it.
Regarding stable_sort – i'm trying to find a way to rely on stable_sort instead 
of radix sort for particular types of distributions.
These checks are done in boost::spread_sort but I haven't got the math behind 
yet.

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-05-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17341526#comment-17341526
 ] 

Kirill Lykov edited comment on ARROW-10899 at 5/9/21, 2:56 PM:
---

Do we have some information about distribution of integers? And also are they 
signed/unsigned?

I've did some more benchmarking and plots 
[https://github.com/KirillLykov/int-sort-bmk.|https://github.com/KirillLykov/int-sort-bmk]
 In short, radix sort is almost always performs better than std::stable_sort.
 But there are some cases when stable_sort is better – sorted sequence, all 
elements are equal.

Another observation is that it  looks like LSD implementation is faster than 
MSD implementation. 
 Yet it is possible to have almost the same performance using MSD/LSD hybrid 
algorithm if we think that MSD is better for memory bandwidth.

The question is how to measure memory bandwidth consumption, I employed 
LLC-cache-misses events. 


was (Author: klykov):
Do we have some information about distribution of integers? And also are they 
signed/unsigned?



I've did some more benchmarking and plots [int-sort-bmks 
repo|[https://github.com/KirillLykov/int-sort-bmk].]
In short, radix sort is almost always performs better than std::stable_sort.
But there are some cases when stable_sort is better – sorted sequence, all 
elements are equal.

Another observation is that it  looks like LSD implementation is faster than 
MSD implementation. 
Yet it is possible to have almost the same performance using MSD/LSD hybrid 
algorithm if we think that MSD is better for memory bandwidth.

The question is how to measure memory bandwidth consumption, I employed 
LLC-cache-misses events. 

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-26 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332655#comment-17332655
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/26/21, 6:02 PM:


Actually, I don't see significant effect of prefetch in my benchmarks. 
Probably, because I use clang. Now wondering how it is compiling. So it might 
be considered as radix_sort6.


was (Author: klykov):
Actually, I don't see significant effect of prefetch in my benchmarks. 
Probably, because I use clang. Now wondering how it is compiling 

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-26 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332609#comment-17332609
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/26/21, 5:20 PM:


I'm continuing experiments with MSD sort.
I'm using gist as log for optimizations
 [https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141]

Please ignore code style sine it is for experiments.
Each comment in this gist is an iteration of optimization.
Please let me know someone has ideas to try. I was planing try to get rid of 
recursion by using stack explicitly. Don't see any other big things to do with 
it.



!all_random_wholeRange.png|width=350,height=350!

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).


was (Author: klykov):
I'm continuing experiments with MSD sort. Please ignore code style sine it is 
for experiments. I'm using gist as log for optimizations
https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141

Each comment in this gist is an iteration of optimization. Please let me know 
someone has ideas to try. I was planing try to get rid of recursion by using 
stack explicitly. Don't see any other big things to do with it.

 !all_random_wholeRange.png|height=350,width=350!

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).


> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-26 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332609#comment-17332609
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/26/21, 5:18 PM:


I'm continuing experiments with MSD sort. Please ignore code style sine it is 
for experiments. I'm using gist as log for optimizations
https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141

Each comment in this gist is an iteration of optimization. Please let me know 
someone has ideas to try. I was planing try to get rid of recursion by using 
stack explicitly. Don't see any other big things to do with it.

 !all_random_wholeRange.png|height=350,width=350!

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).



was (Author: klykov):
I'm continuing experiments with MSD sort. Please ignore code style sine it is 
for experiments. I'm using gist as log for optimizations
https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141

Each comment in this gist is an iteration of optimization. Please let me know 
someone has ideas to try. I was planing try to get rid of recursion by using 
stack explicitly. Don't see any other big things to do with it.

 !all_random_wholeRange.png!height=350,width=350!

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).


> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-26 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332609#comment-17332609
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/26/21, 5:17 PM:


I'm continuing experiments with MSD sort. Please ignore code style sine it is 
for experiments. I'm using gist as log for optimizations
https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141

Each comment in this gist is an iteration of optimization. Please let me know 
someone has ideas to try. I was planing try to get rid of recursion by using 
stack explicitly. Don't see any other big things to do with it.

 !all_random_wholeRange.png!height=350,width=350!

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).



was (Author: klykov):
I'm continuing experiments with MSD sort. Please ignore code style sine it is 
for experiments. I'm using gist as log for optimizations
https://gist.github.com/KirillLykov/c641e63adfd68591bafbdb342f75d141

Each comment in this gist is an iteration of optimization. Please let me know 
someone has ideas to try. I was planing try to get rid of recursion by using 
stack explicitly. Don't see any other big things to do with it.

 !all_random_wholeRange.png! 

Idea by [~michalno] to use another sorting algorithm if the size of the 
subarray to sort is smaller than some level surprisingly doesn't work well on 
my array (for now only uniform distribution).


> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png, 
> all_random_wholeRange.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-19 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324878#comment-17324878
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/19/21, 8:55 AM:


Updates, I've contacted Travis and he wrote that he has a lack of time to work 
on it yet it seems interesting to him. He mentioned also that MSD 
implementation might be better for bandwidth. 

I'm thinking about the way to proceed. One option might be
1. Travis will create a PR just with his radix sort implementation
2. I will add tests and benchmarks to his PR

One thing which kind of worries me is that sorting requires quite intense 
ad-hoc testing and benchmarking, are we fine with adding all these to the Arrow 
repository?


was (Author: klykov):
Updates, I've contacted Travis and he wrote that he has a luck of time to work 
on it yet it seems interesting to him. He mentioned also that MSD 
implementation might be better for bandwidth. 

I'm thinking about the way to proceed. One option might be
1. Travis will create a PR just with his radix sort implementation
2. I will add tests and benchmarks to his PR

One thing which kind of worries me is that sorting requires quite intense 
ad-hoc testing and benchmarking, are we fine with adding all these to the Arrow 
repository?

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-12 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319494#comment-17319494
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/12/21, 3:04 PM:


Right, I agree that it is stable. 
>From performance prospective, the main difference with naive implementation of 
>Cormen's pseudo code:
1) Skip unnecessary iterations (if all the numbers have current bits (8 bits in 
Travis code) set to the same value). Gives ~36%
2) Instead of using offset array use array of pointers which takes into account 
these offsets. So instead of `out[offset[(value>>shift)&bitmask]]=123 `do 
`pOut[(value>>shift)&bitmask] = 123` where pOut is precomputed to be `&out[0] + 
offset[]`. Gives ~24%

He also uses `__builtin_prefetch` from gcc library, I'm not sure if it is 
portable. It might give 12% performance on some large datasets.

I will write to Travis directly. Probably, he will like to contribute a version 
of this sort and I will do plumbing


was (Author: klykov):
Right, I agree that it is stable. 
>From performance prospective, the main difference with naive implementation of 
>Cormen's pseudo code:
1) Skip unnecessary iterations (if all the numbers have current bits (8 bits in 
Travis code) set to the same value). Gives ~36%
2) Instead of using offset array use array of pointers which take into account 
these offsets. So instead of `out[offset[(value>>shift)&bitmask]]=123 `do 
`pOut[(value>>shift)&bitmask] = 123` where pOut is precomputed to be `&out[0] + 
offset[]`. Gives ~24%

He also uses `__builtin_prefetch` from gcc library, I'm not sure if it is 
portable. It might give 12% of performance on some large datasets.

I will write to Travis directly. Probably, he will like to contribute a version 
of this sort and I will do plumbing

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:58 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

Yet not sure if this implementation is stable because the order is from less 
significant bits. But it seems to be easy to change

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:47 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 

Yet not sure if this implementation is stable because the order is from less 
significant bits. But it seems to be easy to change

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 
Yet not sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! yet no sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 4:40 PM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! 
Yet not sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks! yet no sure if this implementation is stable.

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It rocks!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license at this point, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license, see 
https://github.com/travisdowns/sort-bench/issues/1

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:41 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow. What do you think? 

I've added issue to his repo to add license, see 
https://github.com/travisdowns/sort-bench/issues/1


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow, which will require some more benchmarking, testing and 
code polishing. What do you think?

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:33 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks (at least with uniform distributed data)!

!all_random_wholeRange.png|height=350,width=350!

There is no license file in his repo, so I cannot share my experiments.

There might be several ways to proceed. It looks it would be good to ask Travis 
to contribute to Arrow, which will require some more benchmarking, testing and 
code polishing. What do you think?


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=350,width=350!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:25 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=350,width=350!


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=250,width=250!!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-04-11 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17318707#comment-17318707
 ] 

Kirill Lykov edited comment on ARROW-10899 at 4/11/21, 9:24 AM:


Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png|height=250,width=250!!


was (Author: klykov):
Thanks for the reference to the blog, I read all of his posts. 

I've checked with my benchmarks Travis' final radix_sort7 version, see below.
It kind of rocks!

!all_random_wholeRange.png!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png, all_random_wholeRange.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-10 Thread Michal Nowakiewicz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282886#comment-17282886
 ] 

Michal Nowakiewicz edited comment on ARROW-10899 at 2/11/21, 7:10 AM:
--

I don't have any good advice on open source radix sort, but let me share some 
general thoughts about radix sort implementation:
 # Regarding SIMD. It seems to me that unstable radix sort should be relatively 
simple to implement using AVX-512 instruction set (AVX2 is a bit more 
challenging, since it is missing for instance scatter instruction for memory 
stores). Stable radix sort is a bit harder with SIMD but still doable. Is it 
worth it? I believe that the majority of the cost in radix sort is in the data 
movement (memory stores to non-consecutive addresses), especially if the data 
does not fit into CPU cache. SIMD will not help with that. But it will cut the 
cost of everything else. I would expect visible improvement but nothing close 
to 2x speedup. It's probably something better left for later.
 # Regarding papers. [http://www.cs.columbia.edu/~orestis/sigmod14I.pdf] - This 
paper may be a bit too much - it focuses on SIMD, parallel processing, 
optimizing for multiple NUMA nodes and it goes into a lot of depth and touches 
some other topics as well - but at the time it was published it looked to me 
like a pretty good implementation. There may be something interesting in their 
results and conclusions, even though they tested their solution on much bigger 
data sets.
 # Regarding MSB (most significant bits first) vs LSB (least significant bits 
first) radix sort. MSB should be faster on larger data (compared to the size of 
CPU cache), more cache-friendly, and can be implemented as a stable sort.
 # Regarding hybrid sorting. I imagine that a good solution would be to run MSB 
radix sort at first, producing a set of data partitions based on the highest 
bits of the sort key. Once the partitions get small enough (e.g. fit L1 cache) 
I would guess that radix sort is no longer the best solution for sorting data 
(it should get more advantage over comparison based sorting when the input set 
is large). At that point a different sorting algorithm could be better suited, 
so any sort_of_choice can be called on the data within radix partition. If 
radix sort is faster on small arrays (thousands of elements) - great - but if 
it isn't, then using another sort method for small partitions seems like a good 
idea.
 # Regarding data distribution. I think that data distribution is pretty 
important in designing and testing radix sort. For instance, if we sort 1 
million 64-bit uniformly distributed integers, even though the space of 
potential values is huge, there are at most 1 million unique values. If I had 
to sort 1M of 64-bit integers I would consider doing it using range 
partitioning: a) use a small sample of data (e.g. every Nth value for a double 
digit N) to compute a histogram (sort the sample, using any kind of sort, 
divide into K equal size buckets after sorting for range partitioning later 
into K ranges); b) use range partitioning based on the histogram to split data 
into multiple ranges; c) continue the two previous steps recursively until the 
partition size is small and then call a sort_of_choice function on that 
partition. Histogram based range partitioning should take care of data skew and 
create approximately equal sized partitions. That means that for 1M values, in 
order to get down to 1K elements partitions, about 1K partitions need to be 
generated. That can be done for instance as a single pass of range partitioning 
with 1024 partitions or two passes with 32 partitions each. Even though a 
single round of range partitioning (bucket sort) is obviously more expensive 
than pure radix sort pass (extra histogram and finding a range for each value 
given range boundaries) it seems to me like a good candidate, since it lowers 
the total number of partitioning steps that are needed (compared to pure radix 
sort of 64-bit integers that would use something like 8 passes of 8-bit radix 
partitioning phases). And SIMD can definitely help performance-wise in finding 
the right bucket for each value using range boundaries.


was (Author: michalno):
I don't have any good advice on open source radix sort, but let me share some 
general thoughts about radix sort implementation:
 # Regarding SIMD. It seems to me that unstable radix sort should be relatively 
simple to implement using AVX-512 instruction set (AVX2 is a bit more 
challenging, since it is missing for instance scatter instruction for memory 
stores). Stable radix sort is a bit harder with SIMD but still doable. Is it 
worth it? I believe that the majority of the cost in radix sort is in the data 
movement (memory stores to non-consecutive addresses), especially if the data 
does not fit into CPU cache. SIMD will not he

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-10 Thread Michal Nowakiewicz (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282886#comment-17282886
 ] 

Michal Nowakiewicz edited comment on ARROW-10899 at 2/11/21, 7:10 AM:
--

I don't have any good advice on open source radix sort, but let me share some 
general thoughts about radix sort implementation:
 # Regarding SIMD. It seems to me that unstable radix sort should be relatively 
simple to implement using AVX-512 instruction set (AVX2 is a bit more 
challenging, since it is missing for instance scatter instruction for memory 
stores). Stable radix sort is a bit harder with SIMD but still doable. Is it 
worth it? I believe that the majority of the cost in radix sort is in the data 
movement (memory stores to non-consecutive addresses), especially if the data 
does not fit into CPU cache. SIMD will not help with that. But it will cut the 
cost of everything else. I would expect visible improvement but nothing close 
to 2x speedup. It's probably something better left for later.
 # Regarding papers. [http://www.cs.columbia.edu/~orestis/sigmod14I.pdf] - This 
paper may be a bit too much - it focuses on SIMD, parallel processing, 
optimizing for multiple NUMA nodes and it goes into a lot of depth and touches 
some other topics as well - but at the time it was published it looked to me 
like a pretty good implementation. There may be something interesting in their 
results and conclusions, even though they tested their solution on much bigger 
data sets.


 # Regarding MSB (most significant bits first) vs LSB (least significant bits 
first) radix sort. MSB should be faster on larger data (compared to the size of 
CPU cache), more cache-friendly, and can be implemented as a stable sort. 


 # Regarding hybrid sorting. I imagine that a good solution would be to run MSB 
radix sort at first, producing a set of data partitions based on the highest 
bits of the sort key. Once the partitions get small enough (e.g. fit L1 cache) 
I would guess that radix sort is no longer the best solution for sorting data 
(it should get more advantage over comparison based sorting when the input set 
is large). At that point a different sorting algorithm could be better suited, 
so any sort_of_choice can be called on the data within radix partition. If 
radix sort is faster on small arrays (thousands of elements) - great - but if 
it isn't, then using another sort method for small partitions seems like a good 
idea.


 # Regarding data distribution. I think that data distribution is pretty 
important in designing and testing radix sort. For instance, if we sort 1 
million 64-bit uniformly distributed integers, even though the space of 
potential values is huge, there are at most 1 million unique values. If I had 
to sort 1M of 64-bit integers I would consider doing it using range 
partitioning: a) use a small sample of data (e.g. every Nth value for a double 
digit N) to compute a histogram (sort the sample, using any kind of sort, 
divide into K equal size buckets after sorting for range partitioning later 
into K ranges); b) use range partitioning based on the histogram to split data 
into multiple ranges; c) continue the two previous steps recursively until the 
partition size is small and then call a sort_of_choice function on that 
partition. Histogram based range partitioning should take care of data skew and 
create approximately equal sized partitions. That means that for 1M values, in 
order to get down to 1K elements partitions, about 1K partitions need to be 
generated. That can be done for instance as a single pass of range partitioning 
with 1024 partitions or two passes with 32 partitions each. Even though a 
single round of range partitioning (bucket sort) is obviously more expensive 
than pure radix sort pass (extra histogram and finding a range for each value 
given range boundaries) it seems to me like a good candidate, since it lowers 
the total number of partitioning steps that are needed (compared to pure radix 
sort of 64-bit integers that would use something like 8 passes of 8-bit radix 
partitioning phases). And SIMD can definitely help performance-wise in finding 
the right bucket for each value using range boundaries.


was (Author: michalno):
I don't have any good advice on open source radix sort, but let me share some 
general thoughts about radix sort implementation:
 # Regarding SIMD. It seems to me that unstable radix sort should be relatively 
simple to implement using AVX-512 instruction set (AVX2 is a bit more 
challenging, since it is missing for instance scatter instruction for memory 
stores). Stable radix sort is a bit harder with SIMD but still doable. Is it 
worth it? I believe that the majority of the cost in radix sort is in the data 
movement (memory stores to non-consecutive addresses), especially if the data 
does not fit into CPU cache. SIMD will

[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-10 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282355#comment-17282355
 ] 

Antoine Pitrou edited comment on ARROW-10899 at 2/10/21, 10:14 AM:
---

Even a radix sort is not necessarily stable. You have to be careful when 
implementing it: see https://en.wikipedia.org/wiki/Radix_sort

Also, I assume your benchmarks are on random data? Data is very often not 
randomly distributed. On real-world data, a int32 or int64 column doesn't 
necessarily span the whole int32 or int64 range, for example.

Perhaps heuristics would need to be devised, based on the input range and the 
input size you would either use a radix sort (which is O\(n\)) or a 
comparison-based sort such as spinsort (which is O\(n log n\)).


was (Author: pitrou):
Even a radix sort is not necessarily stable. You have to be careful when 
implementing it: see https://en.wikipedia.org/wiki/Radix_sort

Also, I assume your benchmarks are on random data? Data is very often not 
randomly distributed. On real-world data, a int32 or int64 column doesn't 
necessarily span the whole int32 or int64 range, for example.

Perhaps heuristics would need to be devised, based on the input range and the 
input size you would either use a radix sort (which is {{O(n)}}) or a 
comparison-based sort such as spinsort (which is {{O(n log n)}}).

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-10 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282355#comment-17282355
 ] 

Antoine Pitrou edited comment on ARROW-10899 at 2/10/21, 10:13 AM:
---

Even a radix sort is not necessarily stable. You have to be careful when 
implementing it: see https://en.wikipedia.org/wiki/Radix_sort

Also, I assume your benchmarks are on random data? Data is very often not 
randomly distributed. On real-world data, a int32 or int64 column doesn't 
necessarily span the whole int32 or int64 range, for example.

Perhaps heuristics would need to be devised, based on the input range and the 
input size you would either use a radix sort (which is {{O(n)}}) or a 
comparison-based sort such as spinsort (which is {{O(n log n)}}).


was (Author: pitrou):
Even a radix sort is not necessarily stable. You have to be careful when 
implementing it: see https://en.wikipedia.org/wiki/Radix_sort

Also, I assume your benchmarks are on random data? Data is very often not 
randomly distributed. On real-world data, a int32 or int64 column doesn't 
necessarily span the whole int32 or int64 range, for example.

Perhaps heuristics would need to be devised, based on the input range and the 
input size you would either use a radix sort (which is O(n)) or a 
comparison-based sort such as spinsort (which is O(n log n)).

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-10 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282030#comment-17282030
 ] 

Kirill Lykov edited comment on ARROW-10899 at 2/10/21, 10:08 AM:
-

Right, I will check also spinsort from boost. I would add why spreadsort is not 
stable – if array is small or integer doesn't fit to boost::uintmax_t, it uses 
pbsort which is unstable. If everything is fine it is using radix sort, which 
is what we want.
 One solution is to implement stable_spinsort which will use something else 
instead of pbsort. 

Below is the plot for these sorting algorithms for int64_t (one above is for 
int32_t).

!Screen Shot 2021-02-10 at 10.58.23.png!

Looks like stable spinsort is not better than std::stable_sort and this is 
expected. 
 I believe that if we want to use primarily keys shorter or equal to 64 bits, 
it makes sense to look into a-la spreadsort implementation. For that, it is 
possible to reuse some code from boost::sort::spreadsort::details bu[t this is 
might 
be|https://github.com/boostorg/sort/blob/develop/include/boost/sort/spreadsort/detail/integer_sort.hpp]
 a bad idea since it is not part of the public interface.


was (Author: klykov):
Right, I will check also spinsort from boost

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png, Screen Shot 
> 2021-02-10 at 10.58.23.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282023#comment-17282023
 ] 

Kirill Lykov edited comment on ARROW-10899 at 2/9/21, 8:38 PM:
---

Your intuition is correct, at least for int32_t key. I benchmarked separately 
from arrow std::sort, std::stable_sort, boost::spreadsort, myNaiveRadixSort. 
And surprisingly to me even naive radix sort is faster than std sorts. But 
boost's spread sort is better. Y-axis is kMilliseconds, X-axis is number of 
elements in the array. It suggests to use boost::spreadsort

 !Screen Shot 2021-02-09 at 17.48.13.png!


was (Author: klykov):
Your intuition is correct, at least for int32_t key. I benchmarked separately 
from arrow std::sort, std::stable_sort, boost::spreadsort, myNaiveRadixSort. 
And surprisingly to me even naive radix sort is faster than std sorts. But 
boost's spread sort is better. Y-axis is kMilliseconds, X-axis is number of 
elements in the array.
!Screen Shot 2021-02-09 at 17.48.13.png!

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
> Attachments: Screen Shot 2021-02-09 at 17.48.13.png
>
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281864#comment-17281864
 ] 

Kirill Lykov edited comment on ARROW-10899 at 2/9/21, 4:24 PM:
---

1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious 
SIMD Sort" by Satish et al:
 They compare, beside other things, performance of radix sort and merge sort 
using intrinsics.
 Both implementations are highly optimized for simd architecture and radix has 
many not obvious improvements to make it more cache-friendly.
 Radix sort performance depends on the key size because the longer key we have 
the more passes we need to do. Radix is bandwidth-bound starting at 48-bits 
keys.
 Further, Satish et al report that merge sort is 1.7X slower than radix sort on 
32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 
1.3X better on 128-bit keys.
 Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, 
should, therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost 
called spread sort, looks promising. Alternatively one could implement Satish 
paper but this code will be difficult to support 


was (Author: klykov):
1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious 
SIMD Sort" by Satish et al:
 They compare, beside other things, performance of radix sort and merge sort 
using intrinsics.
 Both implementations are highly optimized for simd architecture and radix has 
many not obvious improvements to make it more cache-friendly.
 Radix sort performance depends on the key size because the longer key we have 
the more passes we need to do. Radix is bandwidth-bound starting at 48-bits 
keys.
 Further, Satish et al report that merge sort is 1.7X slower than radix sort on 
32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 
1.3X better on 128-bit keys.
 Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, 
should, therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost 
called spread sort but looks promising. 

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-10899) [C++] Investigate radix sort for integer arrays

2021-02-09 Thread Kirill Lykov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-10899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281864#comment-17281864
 ] 

Kirill Lykov edited comment on ARROW-10899 at 2/9/21, 4:23 PM:
---

1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious 
SIMD Sort" by Satish et al:
 They compare, beside other things, performance of radix sort and merge sort 
using intrinsics.
 Both implementations are highly optimized for simd architecture and radix has 
many not obvious improvements to make it more cache-friendly.
 Radix sort performance depends on the key size because the longer key we have 
the more passes we need to do. Radix is bandwidth-bound starting at 48-bits 
keys.
 Further, Satish et al report that merge sort is 1.7X slower than radix sort on 
32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 
1.3X better on 128-bit keys.
 Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, 
should, therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost 
called spread sort but looks promising. 


was (Author: klykov):
1. From the paper "Fast Sort on CPUs and GPUs: A Case for Bandwidth Oblivious 
SIMD Sort" by Satish et al:
They compare, beside other things, performance of radix sort and merge sort 
using intrinsics.
Both implementations are highly optimized for simd architecture and radix has 
many not obvious improvements to make it more cache-friendly.
Radix sort performance depends on the key size because the longer key we have 
the more passes we need to do. Radix is bandwidth-bound starting at 48-bits 
keys.
Further, Satish et al report that merge sort is 1.7X slower than radix sort on 
32- bit keys, but becomes comparable in performance to radix on 64-bit keys and 
1.3X better on 128-bit keys.
Their conclusion is that "bandwidth-oblivious SIMD friendly merge sort, should, 
therefore, be the sorting method of choice for future databases".

2. I haven't found a good opensource radix sort for CPUs. There is one in boost 
called spread sort but it is not actually a radix sort and according to my 
benchmarks for integers it demonstrates exactly the same performance as gcc's 
stable_sort. From description, It might be designed for strings.

> [C++] Investigate radix sort for integer arrays
> ---
>
> Key: ARROW-10899
> URL: https://issues.apache.org/jira/browse/ARROW-10899
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++
>Reporter: Antoine Pitrou
>Assignee: Kirill Lykov
>Priority: Major
>
> For integer arrays with a non-tiny range of values, we currently use a stable 
> sort. It may be faster to use a radix sort instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)