jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931
@emkornfield This is the new version for sum aggregate without intrinsic, could you help to review? The dense part nearly get the same scores with intrinsic for AVX2 on clang, gcc result is little low. Below is the benchmark results(null_percent 0 and 0.01%) on a AVX2(i7-8700) device. Before ``` SumKernelFloat/1048576/10000 97.5 us 97.3 us 7150 bytes_per_second=10.0336G/s null_percent=0.01 size=1048.58k SumKernelFloat/1048576/0 62.1 us 62.0 us 11292 bytes_per_second=15.7443G/s null_percent=0 size=1048.58k SumKernelDouble/1048576/10000 35.4 us 35.4 us 19781 bytes_per_second=27.5977G/s null_percent=0.01 size=1048.58k SumKernelDouble/1048576/0 32.5 us 32.5 us 21534 bytes_per_second=30.0657G/s null_percent=0 size=1048.58k SumKernelInt8/1048576/10000 183 us 183 us 3832 bytes_per_second=5.34627G/s null_percent=0.01 size=1048.58k SumKernelInt8/1048576/0 133 us 132 us 5285 bytes_per_second=7.37317G/s null_percent=0 size=1048.58k SumKernelInt16/1048576/10000 93.3 us 93.2 us 7505 bytes_per_second=10.4762G/s null_percent=0.01 size=1048.58k SumKernelInt16/1048576/0 68.4 us 68.3 us 10249 bytes_per_second=14.2887G/s null_percent=0 size=1048.58k SumKernelInt32/1048576/10000 49.2 us 49.1 us 14255 bytes_per_second=19.874G/s null_percent=0.01 size=1048.58k SumKernelInt32/1048576/0 40.3 us 40.2 us 17654 bytes_per_second=24.2688G/s null_percent=0 size=1048.58k SumKernelInt64/1048576/10000 35.3 us 35.3 us 19870 bytes_per_second=27.6826G/s null_percent=0.01 size=1048.58k SumKernelInt64/1048576/0 32.4 us 32.3 us 21628 bytes_per_second=30.1902G/s null_percent=0 size=1048.58k ``` After ``` SumKernelFloat/1048576/10000 41.1 us 41.0 us 17004 bytes_per_second=23.7947G/s null_percent=0.01 size=1048.58k SumKernelFloat/1048576/0 25.1 us 25.0 us 27884 bytes_per_second=38.9922G/s null_percent=0 size=1048.58k SumKernelDouble/1048576/10000 24.6 us 24.5 us 28423 bytes_per_second=39.8205G/s null_percent=0.01 size=1048.58k SumKernelDouble/1048576/0 17.1 us 17.1 us 40881 bytes_per_second=57.1186G/s null_percent=0 size=1048.58k SumKernelInt8/1048576/10000 116 us 115 us 6073 bytes_per_second=8.46685G/s null_percent=0.01 size=1048.58k SumKernelInt8/1048576/0 61.0 us 60.9 us 11501 bytes_per_second=16.0293G/s null_percent=0 size=1048.58k SumKernelInt16/1048576/10000 62.2 us 62.2 us 11250 bytes_per_second=15.7108G/s null_percent=0.01 size=1048.58k SumKernelInt16/1048576/0 37.0 us 37.0 us 19883 bytes_per_second=26.4204G/s null_percent=0 size=1048.58k SumKernelInt32/1048576/10000 38.4 us 38.4 us 18217 bytes_per_second=25.4367G/s null_percent=0.01 size=1048.58k SumKernelInt32/1048576/0 24.6 us 24.5 us 28531 bytes_per_second=39.8216G/s null_percent=0 size=1048.58k SumKernelInt64/1048576/10000 24.1 us 24.1 us 29069 bytes_per_second=40.5531G/s null_percent=0.01 size=1048.58k SumKernelInt64/1048576/0 16.7 us 16.7 us 41887 bytes_per_second=58.3943G/s null_percent=0 size=1048.58k ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org