cyb70289 commented on pull request #9604: URL: https://github.com/apache/arrow/pull/9604#issuecomment-787804449
Besides better accuracy, pairwise summation also improves performance significantly for normal cases. - big improvement (~1x) for 0.01%, 0% null count - moderate improvement for 1% null count - big drop (~0.5x) for 10%, 50% null count (due to short continuous data blocks) Benchmark result on skylake, clang-9 (removed Int32 benchs as they're not touched) ``` Before --------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------- VarianceKernelInt64/1048576/10000 201 us 201 us 3479 bytes_per_second=4.85858G/s null_percent=0.01 size=1048.58k VarianceKernelInt64/1048576/100 235 us 235 us 2983 bytes_per_second=4.15903G/s null_percent=1 size=1048.58k VarianceKernelInt64/1048576/10 515 us 515 us 1362 bytes_per_second=1.89616G/s null_percent=10 size=1048.58k VarianceKernelInt64/1048576/2 826 us 825 us 848 bytes_per_second=1.18306G/s null_percent=50 size=1048.58k VarianceKernelInt64/1048576/1 0.866 us 0.866 us 806079 bytes_per_second=1.10133T/s null_percent=100 size=1048.58k VarianceKernelInt64/1048576/0 186 us 186 us 3766 bytes_per_second=5.23785G/s null_percent=0 size=1048.58k VarianceKernelFloat/1048576/10000 549 us 549 us 1274 bytes_per_second=1.77922G/s null_percent=0.01 size=1048.58k VarianceKernelFloat/1048576/100 579 us 579 us 1208 bytes_per_second=1.68558G/s null_percent=1 size=1048.58k VarianceKernelFloat/1048576/10 1040 us 1040 us 673 bytes_per_second=961.64M/s null_percent=10 size=1048.58k VarianceKernelFloat/1048576/2 1650 us 1650 us 424 bytes_per_second=606.079M/s null_percent=50 size=1048.58k VarianceKernelFloat/1048576/1 0.876 us 0.876 us 775800 bytes_per_second=1115.15G/s null_percent=100 size=1048.58k VarianceKernelFloat/1048576/0 541 us 541 us 1294 bytes_per_second=1.80506G/s null_percent=0 size=1048.58k VarianceKernelDouble/1048576/10000 275 us 275 us 2543 bytes_per_second=3.54668G/s null_percent=0.01 size=1048.58k VarianceKernelDouble/1048576/100 277 us 277 us 2530 bytes_per_second=3.52828G/s null_percent=1 size=1048.58k VarianceKernelDouble/1048576/10 500 us 500 us 1397 bytes_per_second=1.95315G/s null_percent=10 size=1048.58k VarianceKernelDouble/1048576/2 793 us 793 us 884 bytes_per_second=1.23201G/s null_percent=50 size=1048.58k VarianceKernelDouble/1048576/1 0.894 us 0.894 us 765871 bytes_per_second=1092.23G/s null_percent=100 size=1048.58k VarianceKernelDouble/1048576/0 271 us 271 us 2576 bytes_per_second=3.59923G/s null_percent=0 size=1048.58k After --------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------------- VarianceKernelInt64/1048576/10000 138 us 138 us 5082 bytes_per_second=7.07395G/s null_percent=0.01 size=1048.58k VarianceKernelInt64/1048576/100 226 us 226 us 2858 bytes_per_second=4.32355G/s null_percent=1 size=1048.58k VarianceKernelInt64/1048576/10 679 us 679 us 1032 bytes_per_second=1.4386G/s null_percent=10 size=1048.58k VarianceKernelInt64/1048576/2 1121 us 1121 us 625 bytes_per_second=891.928M/s null_percent=50 size=1048.58k VarianceKernelInt64/1048576/1 0.835 us 0.835 us 815343 bytes_per_second=1.14166T/s null_percent=100 size=1048.58k VarianceKernelInt64/1048576/0 165 us 165 us 5278 bytes_per_second=5.91169G/s null_percent=0 size=1048.58k VarianceKernelFloat/1048576/10000 333 us 333 us 2103 bytes_per_second=2.93102G/s null_percent=0.01 size=1048.58k VarianceKernelFloat/1048576/100 550 us 550 us 1272 bytes_per_second=1.77448G/s null_percent=1 size=1048.58k VarianceKernelFloat/1048576/10 1699 us 1699 us 412 bytes_per_second=588.556M/s null_percent=10 size=1048.58k VarianceKernelFloat/1048576/2 2788 us 2788 us 251 bytes_per_second=358.73M/s null_percent=50 size=1048.58k VarianceKernelFloat/1048576/1 0.878 us 0.878 us 831430 bytes_per_second=1112.44G/s null_percent=100 size=1048.58k VarianceKernelFloat/1048576/0 327 us 327 us 2164 bytes_per_second=2.98677G/s null_percent=0 size=1048.58k VarianceKernelDouble/1048576/10000 121 us 121 us 5785 bytes_per_second=8.05841G/s null_percent=0.01 size=1048.58k VarianceKernelDouble/1048576/100 222 us 222 us 3145 bytes_per_second=4.40277G/s null_percent=1 size=1048.58k VarianceKernelDouble/1048576/10 789 us 789 us 887 bytes_per_second=1.23705G/s null_percent=10 size=1048.58k VarianceKernelDouble/1048576/2 1428 us 1428 us 490 bytes_per_second=700.099M/s null_percent=50 size=1048.58k VarianceKernelDouble/1048576/1 0.853 us 0.853 us 827831 bytes_per_second=1.11828T/s null_percent=100 size=1048.58k VarianceKernelDouble/1048576/0 140 us 140 us 5986 bytes_per_second=6.97951G/s null_percent=0 size=1048.58k ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org