wesm edited a comment on pull request #7521: URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8 ``` ------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------- BuildDictionary 2625315 ns 2625247 ns 271 null_percent=0.88889 1.4865GB/s BuildStringDictionary 3475855 ns 3475854 ns 200 86.8577MB/s UniqueInt64/0 9842842 ns 9842834 ns 71 null_percent=0 num_unique=1024 3.1749GB/s UniqueInt64/1 10617685 ns 10617360 ns 66 null_percent=0.1 num_unique=1024 2.94329GB/s UniqueInt64/2 12648447 ns 12648430 ns 59 null_percent=1 num_unique=1024 2.47066GB/s UniqueInt64/3 15365202 ns 15365113 ns 43 null_percent=10 num_unique=1024 2.03383GB/s UniqueInt64/4 5126936 ns 5126851 ns 128 null_percent=99 num_unique=1024 6.09536GB/s UniqueInt64/5 1763829 ns 1763809 ns 400 null_percent=100 num_unique=1024 17.7173GB/s UniqueInt64/6 10545960 ns 10545841 ns 67 null_percent=0 num_unique=10.24k 2.96325GB/s UniqueInt64/7 11478529 ns 11478403 ns 61 null_percent=0.1 num_unique=10.24k 2.7225GB/s UniqueInt64/8 12792912 ns 12792429 ns 54 null_percent=1 num_unique=10.24k 2.44285GB/s UniqueInt64/9 16805938 ns 16805535 ns 44 null_percent=10 num_unique=10.24k 1.85951GB/s UniqueInt64/10 5503266 ns 5503108 ns 114 null_percent=99 num_unique=10.24k 5.67861GB/s UniqueInt64/11 1763742 ns 1763699 ns 392 null_percent=100 num_unique=10.24k 17.7184GB/s UniqueString10bytes/0 44193582 ns 44191679 ns 16 null_percent=0 num_unique=1024 905.148MB/s UniqueString10bytes/1 45022703 ns 45022263 ns 15 null_percent=0.1 num_unique=1024 888.449MB/s UniqueString10bytes/2 47131705 ns 47130800 ns 15 null_percent=1 num_unique=1024 848.702MB/s UniqueString10bytes/3 50106213 ns 50105455 ns 14 null_percent=10 num_unique=1024 798.316MB/s UniqueString10bytes/4 15905586 ns 15905158 ns 43 null_percent=99 num_unique=1024 2.45596GB/s UniqueString10bytes/5 12983446 ns 12983327 ns 55 null_percent=100 num_unique=1024 3.00867GB/s UniqueString10bytes/6 62149404 ns 62148971 ns 11 null_percent=0 num_unique=10.24k 643.615MB/s UniqueString10bytes/7 62707969 ns 62705282 ns 11 null_percent=0.1 num_unique=10.24k 637.905MB/s UniqueString10bytes/8 65508665 ns 65508532 ns 10 null_percent=1 num_unique=10.24k 610.607MB/s UniqueString10bytes/9 65766803 ns 65766094 ns 11 null_percent=10 num_unique=10.24k 608.216MB/s UniqueString10bytes/10 16297990 ns 16298076 ns 43 null_percent=99 num_unique=10.24k 2.39676GB/s UniqueString10bytes/11 13298987 ns 13298798 ns 54 null_percent=100 num_unique=10.24k 2.9373GB/s UniqueString100bytes/0 94204048 ns 94200614 ns 7 null_percent=0 num_unique=1024 4.14674GB/s UniqueString100bytes/1 95631478 ns 95630838 ns 7 null_percent=0.1 num_unique=1024 4.08472GB/s UniqueString100bytes/2 96547756 ns 96546348 ns 7 null_percent=1 num_unique=1024 4.04598GB/s UniqueString100bytes/3 91950796 ns 91949032 ns 8 null_percent=10 num_unique=1024 4.24828GB/s UniqueString100bytes/4 17292562 ns 17291979 ns 42 null_percent=99 num_unique=1024 22.59GB/s UniqueString100bytes/5 13096944 ns 13096809 ns 55 null_percent=100 num_unique=1024 29.826GB/s UniqueString100bytes/6 196165738 ns 196161451 ns 4 null_percent=0 num_unique=10.24k 1.99134GB/s UniqueString100bytes/7 198475556 ns 198475456 ns 4 null_percent=0.1 num_unique=10.24k 1.96813GB/s UniqueString100bytes/8 199273625 ns 199270358 ns 3 null_percent=1 num_unique=10.24k 1.96028GB/s UniqueString100bytes/9 189235180 ns 189232925 ns 4 null_percent=10 num_unique=10.24k 2.06425GB/s UniqueString100bytes/10 18381309 ns 18381409 ns 36 null_percent=99 num_unique=10.24k 21.2511GB/s UniqueString100bytes/11 13426102 ns 13426072 ns 51 null_percent=100 num_unique=10.24k 29.0945GB/s UniqueUInt8/0 2239549 ns 2239561 ns 309 null_percent=0 num_unique=200 1.7442GB/s UniqueUInt8/1 2687371 ns 2687349 ns 248 null_percent=0.1 num_unique=200 1.45357GB/s UniqueUInt8/2 4244052 ns 4244058 ns 166 null_percent=1 num_unique=200 942.494MB/s UniqueUInt8/3 7563076 ns 7563066 ns 94 null_percent=10 num_unique=200 528.886MB/s UniqueUInt8/4 3313484 ns 3313447 ns 214 null_percent=99 num_unique=200 1.17891GB/s UniqueUInt8/5 1711948 ns 1711947 ns 415 null_percent=100 num_unique=200 2.28176GB/s ``` Here is the % diff versus the baseline. * Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement * Cases 5 and 11 are the all-null cases. * Case 4 and 10 are the 99% null cases * The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data ``` benchmark baseline contender change % regression 8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False 37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False 33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False 12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False 0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False 36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False 19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False 24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False 5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False 27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False 13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False 29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False 7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False 18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False 23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False 21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False 31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False 20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False 16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False 38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False 1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False 41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False 14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False 4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False 32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False 42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False 39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False 28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False 34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False 11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False 9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False 35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False 6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False 22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False 25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False 17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False 43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False 40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False 3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False 26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False 2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False 30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True 10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True 15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org