jianxind removed a comment on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-636677517
Benchmark data: Before: ``` SumKernelFloat/32768/0 2.96 us 2.96 us 236912 bytes_per_second=10.3227G/s null_percent=0 size=32.768k SumKernelFloat/32768/1 4.88 us 4.88 us 143527 bytes_per_second=6.25439G/s null_percent=1 size=32.768k SumKernelFloat/32768/10 5.13 us 5.13 us 136839 bytes_per_second=5.95117G/s null_percent=10 size=32.768k SumKernelFloat/32768/50 7.82 us 7.81 us 87129 bytes_per_second=3.9054G/s null_percent=50 size=32.768k SumKernelDouble/32768/0 1.97 us 1.97 us 356786 bytes_per_second=15.4906G/s null_percent=0 size=32.768k SumKernelDouble/32768/1 2.11 us 2.11 us 331511 bytes_per_second=14.4975G/s null_percent=1 size=32.768k SumKernelDouble/32768/10 2.39 us 2.38 us 291292 bytes_per_second=12.7966G/s null_percent=10 size=32.768k SumKernelDouble/32768/50 2.60 us 2.60 us 268800 bytes_per_second=11.7462G/s null_percent=50 size=32.768k SumKernelInt8/32768/0 11.7 us 11.7 us 59926 bytes_per_second=2.61569G/s null_percent=0 size=32.768k SumKernelInt8/32768/1 11.0 us 10.9 us 63640 bytes_per_second=2.78831G/s null_percent=1 size=32.768k SumKernelInt8/32768/10 14.8 us 14.8 us 46573 bytes_per_second=2.05848G/s null_percent=10 size=32.768k SumKernelInt8/32768/50 14.6 us 14.6 us 47840 bytes_per_second=2.08905G/s null_percent=50 size=32.768k SumKernelInt16/32768/0 7.06 us 7.06 us 99354 bytes_per_second=4.3245G/s null_percent=0 size=32.768k SumKernelInt16/32768/1 4.76 us 4.75 us 147305 bytes_per_second=6.41928G/s null_percent=1 size=32.768k SumKernelInt16/32768/10 5.64 us 5.63 us 122737 bytes_per_second=5.42002G/s null_percent=10 size=32.768k SumKernelInt16/32768/50 6.71 us 6.70 us 104192 bytes_per_second=4.55206G/s null_percent=50 size=32.768k SumKernelInt32/32768/0 3.92 us 3.92 us 178798 bytes_per_second=7.79042G/s null_percent=0 size=32.768k SumKernelInt32/32768/1 3.27 us 3.27 us 214296 bytes_per_second=9.332G/s null_percent=1 size=32.768k SumKernelInt32/32768/10 3.41 us 3.40 us 204944 bytes_per_second=8.9683G/s null_percent=10 size=32.768k SumKernelInt32/32768/50 3.69 us 3.69 us 190248 bytes_per_second=8.27705G/s null_percent=50 size=32.768k SumKernelInt64/32768/0 1.92 us 1.91 us 368662 bytes_per_second=15.9508G/s null_percent=0 size=32.768k SumKernelInt64/32768/1 2.05 us 2.05 us 340168 bytes_per_second=14.8684G/s null_percent=1 size=32.768k SumKernelInt64/32768/10 2.16 us 2.16 us 323585 bytes_per_second=14.1164G/s null_percent=10 size=32.768k SumKernelInt64/32768/50 2.41 us 2.41 us 291073 bytes_per_second=12.6873G/s null_percent=50 size=32.768k ``` After: ``` SumKernelFloat/32768/0 2.27 us 2.27 us 307928 bytes_per_second=13.438G/s null_percent=0 size=32.768k SumKernelFloat/32768/1 4.59 us 4.59 us 152827 bytes_per_second=6.6508G/s null_percent=1 size=32.768k SumKernelFloat/32768/10 5.30 us 5.29 us 132106 bytes_per_second=5.76658G/s null_percent=10 size=32.768k SumKernelFloat/32768/50 5.80 us 5.80 us 114378 bytes_per_second=5.26584G/s null_percent=50 size=32.768k SumKernelDouble/32768/0 1.42 us 1.42 us 494426 bytes_per_second=21.5265G/s null_percent=0 size=32.768k SumKernelDouble/32768/1 2.12 us 2.12 us 330890 bytes_per_second=14.4268G/s null_percent=1 size=32.768k SumKernelDouble/32768/10 2.44 us 2.43 us 286310 bytes_per_second=12.5441G/s null_percent=10 size=32.768k SumKernelDouble/32768/50 2.72 us 2.71 us 257105 bytes_per_second=11.2507G/s null_percent=50 size=32.768k SumKernelInt8/32768/0 5.35 us 5.34 us 130751 bytes_per_second=5.71315G/s null_percent=0 size=32.768k SumKernelInt8/32768/1 9.80 us 9.79 us 71384 bytes_per_second=3.11589G/s null_percent=1 size=32.768k SumKernelInt8/32768/10 13.9 us 13.9 us 49729 bytes_per_second=2.19116G/s null_percent=10 size=32.768k SumKernelInt8/32768/50 12.5 us 12.5 us 55929 bytes_per_second=2.43479G/s null_percent=50 size=32.768k SumKernelInt16/32768/0 3.20 us 3.19 us 218923 bytes_per_second=9.55594G/s null_percent=0 size=32.768k SumKernelInt16/32768/1 5.31 us 5.31 us 131394 bytes_per_second=5.75174G/s null_percent=1 size=32.768k SumKernelInt16/32768/10 6.20 us 6.19 us 113037 bytes_per_second=4.92965G/s null_percent=10 size=32.768k SumKernelInt16/32768/50 7.25 us 7.24 us 96604 bytes_per_second=4.21535G/s null_percent=50 size=32.768k SumKernelInt32/32768/0 2.18 us 2.18 us 321572 bytes_per_second=14.0037G/s null_percent=0 size=32.768k SumKernelInt32/32768/1 3.32 us 3.32 us 209911 bytes_per_second=9.18857G/s null_percent=1 size=32.768k SumKernelInt32/32768/10 3.59 us 3.58 us 195106 bytes_per_second=8.51472G/s null_percent=10 size=32.768k SumKernelInt32/32768/50 3.83 us 3.82 us 182739 bytes_per_second=7.98056G/s null_percent=50 size=32.768k SumKernelInt64/32768/0 1.37 us 1.37 us 514237 bytes_per_second=22.3564G/s null_percent=0 size=32.768k SumKernelInt64/32768/1 2.09 us 2.09 us 333678 bytes_per_second=14.5962G/s null_percent=1 size=32.768k SumKernelInt64/32768/10 2.18 us 2.18 us 320094 bytes_per_second=13.9904G/s null_percent=10 size=32.768k SumKernelInt64/32768/50 2.41 us 2.40 us 289766 bytes_per_second=12.6907G/s null_percent=50 size=32.768k ``` All dense part of data types has some improvements , ex Double jump to 21.5265G/s from 15.4906G/s. The sparse parts I will look into later as it need some additional to remove the invalid value before passing to the SIMD add operations, it need some shuffle op to replace the invalid value to zero. Also the dense part can be speed up again if using AVX2/AVX512 which is a later job also. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
