GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/2628
[FLINK-3722] [runtime] Don't / and % when sorting
Replace division and modulus with addition and subtraction.
The timing chart below has three columns for two Gelly algorithms with
increasing scale. The first is timings for the master branch. The second column
replaces division and modulus with shift-and-bitmask by storing each element in
a power-of-2 chunk of memory. The third column is for this PR which modifies
the sort algorithm to track the buffer number and offset for each index. This
code has the advantage that there is no wasted memory.
In this comparison, for both int and long, we look to be already operating
on power-of-2 size elements. I would expect this PR to perform relatively
better with the FixedLengthRecordSorter instrumented from FLINK-4705 where
shift-and-bitmask would be padding elements.
This initial PR does not modify `HeapSort`. I wanted to first get
acceptance for the current set of modifications. There are further
optimizations which can be benchmarked in follow-on tickets.
```
HITS, scale=16, INT : runtime= 1.618 (8) 1.524 (8) ( -5.82%)
1.546 (8) ( -4.47%)
HITS, scale=16, LONG : runtime= 1.579 (8) 1.552 (8) ( -1.76%)
1.541 (8) ( -2.43%)
HITS, scale=16, STRING: runtime= 1.774 (8) 1.739 (8) ( -1.99%)
1.701 (8) ( -4.11%)
HITS, scale=17, INT : runtime= 2.032 (8) 1.950 (8) ( -4.06%)
1.937 (8) ( -4.71%)
HITS, scale=17, LONG : runtime= 1.986 (8) 1.923 (8) ( -3.15%)
1.926 (8) ( -3.01%)
HITS, scale=17, STRING: runtime= 2.377 (8) 2.275 (8) ( -4.30%)
2.289 (8) ( -3.68%)
HITS, scale=18, INT : runtime= 2.894 (8) 2.761 (8) ( -4.59%)
2.762 (8) ( -4.54%)
HITS, scale=18, LONG : runtime= 2.863 (8) 2.754 (8) ( -3.81%)
2.735 (8) ( -4.48%)
HITS, scale=18, STRING: runtime= 3.683 (8) 3.455 (8) ( -6.18%)
3.461 (8) ( -6.02%)
HITS, scale=19, INT : runtime= 4.914 (8) 4.631 (8) ( -5.76%)
4.595 (8) ( -6.49%)
HITS, scale=19, LONG : runtime= 4.792 (8) 4.578 (8) ( -4.45%)
4.538 (8) ( -5.28%)
HITS, scale=19, STRING: runtime= 6.484 (8) 6.149 (8) ( -5.18%)
6.118 (8) ( -5.64%)
HITS, scale=20, INT : runtime= 8.662 (8) 8.086 (8) ( -6.64%)
8.091 (8) ( -6.59%)
HITS, scale=20, LONG : runtime= 8.407 (8) 7.993 (8) ( -4.93%)
7.918 (8) ( -5.82%)
HITS, scale=20, STRING: runtime= 11.946 (8) 11.327 (8) ( -5.18%)
11.272 (8) ( -5.64%)
HITS, scale=21, INT : runtime= 16.248 (8) 15.137 (8) ( -6.84%)
15.207 (8) ( -6.41%)
HITS, scale=21, LONG : runtime= 15.907 (8) 14.750 (8) ( -7.28%)
14.724 (8) ( -7.44%)
HITS, scale=21, STRING: runtime= 23.634 (8) 22.596 (8) ( -4.39%)
22.546 (8) ( -4.61%)
HITS, scale=22, INT : runtime= 31.964 (8) 29.954 (8) ( -6.29%)
29.808 (8) ( -6.74%)
HITS, scale=22, LONG : runtime= 31.244 (8) 29.241 (8) ( -6.41%)
28.829 (8) ( -7.73%)
HITS, scale=22, STRING: runtime= 47.870 (6) 45.942 (6) ( -4.03%)
45.987 (8) ( -3.93%)
HITS, scale=23, INT : runtime= 65.146 (4) 60.245 (4) ( -7.52%)
60.008 (4) ( -7.89%)
HITS, scale=23, LONG : runtime= 64.770 (4) 59.828 (4) ( -7.63%)
59.334 (4) ( -8.39%)
HITS, scale=23, STRING: runtime= 103.690 (2) 99.335 (2) ( -4.20%)
98.919 (3) ( -4.60%)
HITS, scale=24, INT : runtime= 156.567 (2)138.923 (2) (-11.27%)
141.185 (2) ( -9.82%)
HITS, scale=24, LONG : runtime= 141.904 (2)132.292 (2) ( -6.77%)
129.677 (2) ( -8.62%)
HITS, scale=24, STRING: runtime= 221.081 (1)217.128 (1) ( -1.79%)
213.822 (1) ( -3.28%)
HITS, scale=25, INT : runtime=
280.720 (1)
HITS, scale=25, LONG : runtime= 321.418 (1)297.764 (1) ( -7.36%)
293.982 (1) ( -8.54%)
JaccardIndex, scale=12, INT : runtime= 0.413 (8) 0.424 (8) (
2.57%) 0.408 (8) ( -1.30%)
JaccardIndex, scale=12, LONG : runtime= 0.440 (8) 0.414 (8) (
-5.85%) 0.424 (8) ( -3.64%)
JaccardIndex, scale=12, STRING: runtime= 0.667 (8) 0.647 (8) (
-2.91%) 0.644 (8) ( -3.38%)
JaccardIndex, scale=13, INT : runtime= 0.811 (8) 0.743 (8) (
-8.45%) 0.764 (8) ( -5.75%)
JaccardIndex, scale=13, LONG : runtime= 0.885 (8) 0.822 (8) (
-7.12%) 0.805 (8) ( -9.01%)
JaccardIndex, scale=13, STRING: runtime= 1.599 (8) 1.489 (8) (
-6.86%) 1.486 (8) ( -7.04%)
JaccardIndex, scale=14, INT : runtime= 1.809 (8) 1.621 (8)
(-10.43%) 1.649 (8) ( -8.87%)
JaccardIndex, scale=14, LONG : runtime= 2.003 (8) 1.834 (8) (
-8.43%) 1.800 (8) (-10.13%)
JaccardIndex, scale=14, STRING: runtime= 3.855 (8) 3.666 (8) (
-4.91%) 3.667 (8) ( -4.86%)