wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations
UserCounters...
-------------------------------------------------------------------------------
BuildDictionary 2625315 ns 2625247 ns 271
null_percent=0.88889 1.4865GB/s
BuildStringDictionary 3475855 ns 3475854 ns 200 86.8577MB/s
UniqueInt64/0 9842842 ns 9842834 ns 71
null_percent=0 num_unique=1024 3.1749GB/s
UniqueInt64/1 10617685 ns 10617360 ns 66
null_percent=0.1 num_unique=1024 2.94329GB/s
UniqueInt64/2 12648447 ns 12648430 ns 59
null_percent=1 num_unique=1024 2.47066GB/s
UniqueInt64/3 15365202 ns 15365113 ns 43
null_percent=10 num_unique=1024 2.03383GB/s
UniqueInt64/4 5126936 ns 5126851 ns 128
null_percent=99 num_unique=1024 6.09536GB/s
UniqueInt64/5 1763829 ns 1763809 ns 400
null_percent=100 num_unique=1024 17.7173GB/s
UniqueInt64/6 10545960 ns 10545841 ns 67
null_percent=0 num_unique=10.24k 2.96325GB/s
UniqueInt64/7 11478529 ns 11478403 ns 61
null_percent=0.1 num_unique=10.24k 2.7225GB/s
UniqueInt64/8 12792912 ns 12792429 ns 54
null_percent=1 num_unique=10.24k 2.44285GB/s
UniqueInt64/9 16805938 ns 16805535 ns 44
null_percent=10 num_unique=10.24k 1.85951GB/s
UniqueInt64/10 5503266 ns 5503108 ns 114
null_percent=99 num_unique=10.24k 5.67861GB/s
UniqueInt64/11 1763742 ns 1763699 ns 392
null_percent=100 num_unique=10.24k 17.7184GB/s
UniqueString10bytes/0 44193582 ns 44191679 ns 16
null_percent=0 num_unique=1024 905.148MB/s
UniqueString10bytes/1 45022703 ns 45022263 ns 15
null_percent=0.1 num_unique=1024 888.449MB/s
UniqueString10bytes/2 47131705 ns 47130800 ns 15
null_percent=1 num_unique=1024 848.702MB/s
UniqueString10bytes/3 50106213 ns 50105455 ns 14
null_percent=10 num_unique=1024 798.316MB/s
UniqueString10bytes/4 15905586 ns 15905158 ns 43
null_percent=99 num_unique=1024 2.45596GB/s
UniqueString10bytes/5 12983446 ns 12983327 ns 55
null_percent=100 num_unique=1024 3.00867GB/s
UniqueString10bytes/6 62149404 ns 62148971 ns 11
null_percent=0 num_unique=10.24k 643.615MB/s
UniqueString10bytes/7 62707969 ns 62705282 ns 11
null_percent=0.1 num_unique=10.24k 637.905MB/s
UniqueString10bytes/8 65508665 ns 65508532 ns 10
null_percent=1 num_unique=10.24k 610.607MB/s
UniqueString10bytes/9 65766803 ns 65766094 ns 11
null_percent=10 num_unique=10.24k 608.216MB/s
UniqueString10bytes/10 16297990 ns 16298076 ns 43
null_percent=99 num_unique=10.24k 2.39676GB/s
UniqueString10bytes/11 13298987 ns 13298798 ns 54
null_percent=100 num_unique=10.24k 2.9373GB/s
UniqueString100bytes/0 94204048 ns 94200614 ns 7
null_percent=0 num_unique=1024 4.14674GB/s
UniqueString100bytes/1 95631478 ns 95630838 ns 7
null_percent=0.1 num_unique=1024 4.08472GB/s
UniqueString100bytes/2 96547756 ns 96546348 ns 7
null_percent=1 num_unique=1024 4.04598GB/s
UniqueString100bytes/3 91950796 ns 91949032 ns 8
null_percent=10 num_unique=1024 4.24828GB/s
UniqueString100bytes/4 17292562 ns 17291979 ns 42
null_percent=99 num_unique=1024 22.59GB/s
UniqueString100bytes/5 13096944 ns 13096809 ns 55
null_percent=100 num_unique=1024 29.826GB/s
UniqueString100bytes/6 196165738 ns 196161451 ns 4
null_percent=0 num_unique=10.24k 1.99134GB/s
UniqueString100bytes/7 198475556 ns 198475456 ns 4
null_percent=0.1 num_unique=10.24k 1.96813GB/s
UniqueString100bytes/8 199273625 ns 199270358 ns 3
null_percent=1 num_unique=10.24k 1.96028GB/s
UniqueString100bytes/9 189235180 ns 189232925 ns 4
null_percent=10 num_unique=10.24k 2.06425GB/s
UniqueString100bytes/10 18381309 ns 18381409 ns 36
null_percent=99 num_unique=10.24k 21.2511GB/s
UniqueString100bytes/11 13426102 ns 13426072 ns 51
null_percent=100 num_unique=10.24k 29.0945GB/s
UniqueUInt8/0 2239549 ns 2239561 ns 309
null_percent=0 num_unique=200 1.7442GB/s
UniqueUInt8/1 2687371 ns 2687349 ns 248
null_percent=0.1 num_unique=200 1.45357GB/s
UniqueUInt8/2 4244052 ns 4244058 ns 166
null_percent=1 num_unique=200 942.494MB/s
UniqueUInt8/3 7563076 ns 7563066 ns 94
null_percent=10 num_unique=200 528.886MB/s
UniqueUInt8/4 3313484 ns 3313447 ns 214
null_percent=99 num_unique=200 1.17891GB/s
UniqueUInt8/5 1711948 ns 1711947 ns 415
null_percent=100 num_unique=200 2.28176GB/s
```
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf
improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 and 10 are the 99% null cases
* The "BuildDictionary" case at the bottom with the perf regression is one
of the "worst case scenarios". 89% of the values are null and so we almost
never observe an all-null or all-not-null block. The use of `BitUtil::GetBit`
over BitmapReader causes this slightly regression since nearly every validity
bit must be checked separately. I don't think it's worth optimizing for this
case since the others are more empirically representative of real world data
```
benchmark baseline contender change %
regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787
False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456
False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751
False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841
False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692
False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252
False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794
False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435
False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381
False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989
False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047
False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126
False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903
False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535
False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840
False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476
False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651
False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189
False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053
False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019
False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588
False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562
False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709
False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260
False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368
False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701
False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613
False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360
False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722
False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544
False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210
False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099
False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966
False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783
False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697
False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115
False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071
False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130
False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519
False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935
False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782
False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649
True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931
True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742
True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]