jianxind commented on pull request #7607:
URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931


   @emkornfield This is the new version for sum aggregate without intrinsic, 
could you help to review?
   
   The dense part nearly get the same scores with intrinsic for AVX2 on clang, 
gcc result is little low.
   
   Below is the benchmark results(null_percent 0 and 0.01%) on a AVX2(i7-8700) 
device.
   Before
   ```
   SumKernelFloat/1048576/10000        97.5 us         97.3 us         7150 
bytes_per_second=10.0336G/s null_percent=0.01 size=1048.58k
   SumKernelFloat/1048576/0            62.1 us         62.0 us        11292 
bytes_per_second=15.7443G/s null_percent=0 size=1048.58k
   SumKernelDouble/1048576/10000       35.4 us         35.4 us        19781 
bytes_per_second=27.5977G/s null_percent=0.01 size=1048.58k
   SumKernelDouble/1048576/0           32.5 us         32.5 us        21534 
bytes_per_second=30.0657G/s null_percent=0 size=1048.58k
   SumKernelInt8/1048576/10000          183 us          183 us         3832 
bytes_per_second=5.34627G/s null_percent=0.01 size=1048.58k
   SumKernelInt8/1048576/0              133 us          132 us         5285 
bytes_per_second=7.37317G/s null_percent=0 size=1048.58k
   SumKernelInt16/1048576/10000        93.3 us         93.2 us         7505 
bytes_per_second=10.4762G/s null_percent=0.01 size=1048.58k
   SumKernelInt16/1048576/0            68.4 us         68.3 us        10249 
bytes_per_second=14.2887G/s null_percent=0 size=1048.58k
   SumKernelInt32/1048576/10000        49.2 us         49.1 us        14255 
bytes_per_second=19.874G/s null_percent=0.01 size=1048.58k
   SumKernelInt32/1048576/0            40.3 us         40.2 us        17654 
bytes_per_second=24.2688G/s null_percent=0 size=1048.58k
   SumKernelInt64/1048576/10000        35.3 us         35.3 us        19870 
bytes_per_second=27.6826G/s null_percent=0.01 size=1048.58k
   SumKernelInt64/1048576/0            32.4 us         32.3 us        21628 
bytes_per_second=30.1902G/s null_percent=0 size=1048.58k
   ```
   
   After
   ```
   SumKernelFloat/1048576/10000        41.1 us         41.0 us        17004 
bytes_per_second=23.7947G/s null_percent=0.01 size=1048.58k
   SumKernelFloat/1048576/0            25.1 us         25.0 us        27884 
bytes_per_second=38.9922G/s null_percent=0 size=1048.58k
   SumKernelDouble/1048576/10000       24.6 us         24.5 us        28423 
bytes_per_second=39.8205G/s null_percent=0.01 size=1048.58k
   SumKernelDouble/1048576/0           17.1 us         17.1 us        40881 
bytes_per_second=57.1186G/s null_percent=0 size=1048.58k
   SumKernelInt8/1048576/10000          116 us          115 us         6073 
bytes_per_second=8.46685G/s null_percent=0.01 size=1048.58k
   SumKernelInt8/1048576/0             61.0 us         60.9 us        11501 
bytes_per_second=16.0293G/s null_percent=0 size=1048.58k
   SumKernelInt16/1048576/10000        62.2 us         62.2 us        11250 
bytes_per_second=15.7108G/s null_percent=0.01 size=1048.58k
   SumKernelInt16/1048576/0            37.0 us         37.0 us        19883 
bytes_per_second=26.4204G/s null_percent=0 size=1048.58k
   SumKernelInt32/1048576/10000        38.4 us         38.4 us        18217 
bytes_per_second=25.4367G/s null_percent=0.01 size=1048.58k
   SumKernelInt32/1048576/0            24.6 us         24.5 us        28531 
bytes_per_second=39.8216G/s null_percent=0 size=1048.58k
   SumKernelInt64/1048576/10000        24.1 us         24.1 us        29069 
bytes_per_second=40.5531G/s null_percent=0.01 size=1048.58k
   SumKernelInt64/1048576/0            16.7 us         16.7 us        41887 
bytes_per_second=58.3943G/s null_percent=0 size=1048.58k
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to