jianxind removed a comment on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-636677517


   Benchmark data:
   
   Before:
   ```
   SumKernelFloat/32768/0         2.96 us         2.96 us       236912 
bytes_per_second=10.3227G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         4.88 us         4.88 us       143527 
bytes_per_second=6.25439G/s null_percent=1 size=32.768k
   SumKernelFloat/32768/10        5.13 us         5.13 us       136839 
bytes_per_second=5.95117G/s null_percent=10 size=32.768k
   SumKernelFloat/32768/50        7.82 us         7.81 us        87129 
bytes_per_second=3.9054G/s null_percent=50 size=32.768k
   SumKernelDouble/32768/0        1.97 us         1.97 us       356786 
bytes_per_second=15.4906G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        2.11 us         2.11 us       331511 
bytes_per_second=14.4975G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/10       2.39 us         2.38 us       291292 
bytes_per_second=12.7966G/s null_percent=10 size=32.768k
   SumKernelDouble/32768/50       2.60 us         2.60 us       268800 
bytes_per_second=11.7462G/s null_percent=50 size=32.768k
   SumKernelInt8/32768/0          11.7 us         11.7 us        59926 
bytes_per_second=2.61569G/s null_percent=0 size=32.768k
   SumKernelInt8/32768/1          11.0 us         10.9 us        63640 
bytes_per_second=2.78831G/s null_percent=1 size=32.768k
   SumKernelInt8/32768/10         14.8 us         14.8 us        46573 
bytes_per_second=2.05848G/s null_percent=10 size=32.768k
   SumKernelInt8/32768/50         14.6 us         14.6 us        47840 
bytes_per_second=2.08905G/s null_percent=50 size=32.768k
   SumKernelInt16/32768/0         7.06 us         7.06 us        99354 
bytes_per_second=4.3245G/s null_percent=0 size=32.768k
   SumKernelInt16/32768/1         4.76 us         4.75 us       147305 
bytes_per_second=6.41928G/s null_percent=1 size=32.768k
   SumKernelInt16/32768/10        5.64 us         5.63 us       122737 
bytes_per_second=5.42002G/s null_percent=10 size=32.768k
   SumKernelInt16/32768/50        6.71 us         6.70 us       104192 
bytes_per_second=4.55206G/s null_percent=50 size=32.768k
   SumKernelInt32/32768/0         3.92 us         3.92 us       178798 
bytes_per_second=7.79042G/s null_percent=0 size=32.768k
   SumKernelInt32/32768/1         3.27 us         3.27 us       214296 
bytes_per_second=9.332G/s null_percent=1 size=32.768k
   SumKernelInt32/32768/10        3.41 us         3.40 us       204944 
bytes_per_second=8.9683G/s null_percent=10 size=32.768k
   SumKernelInt32/32768/50        3.69 us         3.69 us       190248 
bytes_per_second=8.27705G/s null_percent=50 size=32.768k
   SumKernelInt64/32768/0         1.92 us         1.91 us       368662 
bytes_per_second=15.9508G/s null_percent=0 size=32.768k
   SumKernelInt64/32768/1         2.05 us         2.05 us       340168 
bytes_per_second=14.8684G/s null_percent=1 size=32.768k
   SumKernelInt64/32768/10        2.16 us         2.16 us       323585 
bytes_per_second=14.1164G/s null_percent=10 size=32.768k
   SumKernelInt64/32768/50        2.41 us         2.41 us       291073 
bytes_per_second=12.6873G/s null_percent=50 size=32.768k
   ```
   
   After:
   ```
   SumKernelFloat/32768/0         2.27 us         2.27 us       307928 
bytes_per_second=13.438G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         4.59 us         4.59 us       152827 
bytes_per_second=6.6508G/s null_percent=1 size=32.768k
   SumKernelFloat/32768/10        5.30 us         5.29 us       132106 
bytes_per_second=5.76658G/s null_percent=10 size=32.768k
   SumKernelFloat/32768/50        5.80 us         5.80 us       114378 
bytes_per_second=5.26584G/s null_percent=50 size=32.768k
   SumKernelDouble/32768/0        1.42 us         1.42 us       494426 
bytes_per_second=21.5265G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        2.12 us         2.12 us       330890 
bytes_per_second=14.4268G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/10       2.44 us         2.43 us       286310 
bytes_per_second=12.5441G/s null_percent=10 size=32.768k
   SumKernelDouble/32768/50       2.72 us         2.71 us       257105 
bytes_per_second=11.2507G/s null_percent=50 size=32.768k
   SumKernelInt8/32768/0          5.35 us         5.34 us       130751 
bytes_per_second=5.71315G/s null_percent=0 size=32.768k
   SumKernelInt8/32768/1          9.80 us         9.79 us        71384 
bytes_per_second=3.11589G/s null_percent=1 size=32.768k
   SumKernelInt8/32768/10         13.9 us         13.9 us        49729 
bytes_per_second=2.19116G/s null_percent=10 size=32.768k
   SumKernelInt8/32768/50         12.5 us         12.5 us        55929 
bytes_per_second=2.43479G/s null_percent=50 size=32.768k
   SumKernelInt16/32768/0         3.20 us         3.19 us       218923 
bytes_per_second=9.55594G/s null_percent=0 size=32.768k
   SumKernelInt16/32768/1         5.31 us         5.31 us       131394 
bytes_per_second=5.75174G/s null_percent=1 size=32.768k
   SumKernelInt16/32768/10        6.20 us         6.19 us       113037 
bytes_per_second=4.92965G/s null_percent=10 size=32.768k
   SumKernelInt16/32768/50        7.25 us         7.24 us        96604 
bytes_per_second=4.21535G/s null_percent=50 size=32.768k
   SumKernelInt32/32768/0         2.18 us         2.18 us       321572 
bytes_per_second=14.0037G/s null_percent=0 size=32.768k
   SumKernelInt32/32768/1         3.32 us         3.32 us       209911 
bytes_per_second=9.18857G/s null_percent=1 size=32.768k
   SumKernelInt32/32768/10        3.59 us         3.58 us       195106 
bytes_per_second=8.51472G/s null_percent=10 size=32.768k
   SumKernelInt32/32768/50        3.83 us         3.82 us       182739 
bytes_per_second=7.98056G/s null_percent=50 size=32.768k
   SumKernelInt64/32768/0         1.37 us         1.37 us       514237 
bytes_per_second=22.3564G/s null_percent=0 size=32.768k
   SumKernelInt64/32768/1         2.09 us         2.09 us       333678 
bytes_per_second=14.5962G/s null_percent=1 size=32.768k
   SumKernelInt64/32768/10        2.18 us         2.18 us       320094 
bytes_per_second=13.9904G/s null_percent=10 size=32.768k
   SumKernelInt64/32768/50        2.41 us         2.40 us       289766 
bytes_per_second=12.6907G/s null_percent=50 size=32.768k
   ```
   
   All dense part of data types has some improvements , ex Double jump to 
21.5265G/s from 15.4906G/s.
   
   The sparse parts I will look into later as it need some additional to remove 
the invalid value before passing to the SIMD add operations, it need some 
shuffle op to replace the invalid value to zero.
   
   Also the dense part can be speed up again if using AVX2/AVX512 which is a 
later job also.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to