lidavidm commented on pull request #10813:
URL: https://github.com/apache/arrow/pull/10813#issuecomment-887509020


   This is just to see how the hash aggregate kernels perform compared to the 
dedicated scalar aggregation kernels in the case that there is only one group.
   
   Unfortunately, it's rather terrible. For count:
   
   <details>
   
   ```
   
---------------------------------------------------------------------------------------------------
   Benchmark                                         Time             CPU   
Iterations UserCounters...
   
---------------------------------------------------------------------------------------------------
   CountKernelBenchInt64/1048576/2                3067 ns         3067 ns       
457331 bytes_per_second=318.4G/s null_percent=50 size=1048.58k
   CountKernelBenchInt64Aggregate/1048576/2     412162 ns       412138 ns       
  3238 bytes_per_second=2.3695G/s null_percent=50 size=1048.58k
   ```
   
   </details>
   
   At 2 orders of magnitude slower, the hash aggregate kernel isn't anywhere 
near the dedicated scalar one. The scalar kernel essentially just calls 
CountSetBits, while the hash aggregate kernel must use VisitSetBitRuns and 
index into a length-1 array of counts. Also, a good amount of time (~10% of the 
runtime according to perf) is spent just allocating and filling an array of 
group IDs to use at the start.
   
   For min_max the story is not so clear. The hash aggregate kernel actually 
wins for floats, but loses badly (not as badly as with Count) for integers.
   
   <details>
   
   ```
   
----------------------------------------------------------------------------------------------------
   Benchmark                                          Time             CPU   
Iterations UserCounters...
   
----------------------------------------------------------------------------------------------------
   MinMaxKernelFloat/1048576/10000                  981 us          981 us      
    638 bytes_per_second=1018.91M/s null_percent=0.01 size=1048.58k
   MinMaxKernelFloat/1048576/100                   1008 us         1008 us      
    723 bytes_per_second=992.523M/s null_percent=1 size=1048.58k
   MinMaxKernelFloat/1048576/10                    1062 us         1062 us      
    561 bytes_per_second=941.703M/s null_percent=10 size=1048.58k
   MinMaxKernelFloat/1048576/2                     1424 us         1424 us      
    456 bytes_per_second=702.401M/s null_percent=50 size=1048.58k
   MinMaxKernelFloat/1048576/1                     6.92 us         6.92 us      
 105816 bytes_per_second=141.155G/s null_percent=100 size=1048.58k
   MinMaxKernelFloat/1048576/0                      900 us          900 us      
    815 bytes_per_second=1111.18M/s null_percent=0 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/10000         667 us          667 us      
   1103 bytes_per_second=1.46325G/s null_percent=0.01 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/100           654 us          654 us      
    924 bytes_per_second=1.49389G/s null_percent=1 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/10            765 us          765 us      
    965 bytes_per_second=1.27599G/s null_percent=10 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/2            1267 us         1267 us      
    585 bytes_per_second=789.267M/s null_percent=50 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/1             421 us          421 us      
   1693 bytes_per_second=2.32129G/s null_percent=100 size=1048.58k
   MinMaxKernelFloatAggregate/1048576/0             668 us          668 us      
   1107 bytes_per_second=1.46147G/s null_percent=0 size=1048.58k
   MinMaxKernelDouble/1048576/10000                 420 us          420 us      
   1712 bytes_per_second=2.32776G/s null_percent=0.01 size=1048.58k
   MinMaxKernelDouble/1048576/100                   465 us          465 us      
   1412 bytes_per_second=2.10164G/s null_percent=1 size=1048.58k
   MinMaxKernelDouble/1048576/10                    592 us          592 us      
   1168 bytes_per_second=1.64947G/s null_percent=10 size=1048.58k
   MinMaxKernelDouble/1048576/2                     730 us          730 us      
   1008 bytes_per_second=1.33826G/s null_percent=50 size=1048.58k
   MinMaxKernelDouble/1048576/1                    4.10 us         4.10 us      
 177426 bytes_per_second=238.21G/s null_percent=100 size=1048.58k
   MinMaxKernelDouble/1048576/0                     540 us          540 us      
   1000 bytes_per_second=1.80829G/s null_percent=0 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/10000        342 us          342 us      
   2106 bytes_per_second=2.85799G/s null_percent=0.01 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/100          346 us          346 us      
   2136 bytes_per_second=2.82629G/s null_percent=1 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/10           385 us          385 us      
   1911 bytes_per_second=2.53959G/s null_percent=10 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/2            631 us          631 us      
   1163 bytes_per_second=1.54829G/s null_percent=50 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/1            218 us          218 us      
   3413 bytes_per_second=4.48758G/s null_percent=100 size=1048.58k
   MinMaxKernelDoubleAggregate/1048576/0            334 us          334 us      
   2193 bytes_per_second=2.92247G/s null_percent=0 size=1048.58k
   MinMaxKernelInt8/1048576/10000                   571 us          571 us      
   1293 bytes_per_second=1.71088G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8/1048576/100                     986 us          986 us      
    742 bytes_per_second=1014.35M/s null_percent=1 size=1048.58k
   MinMaxKernelInt8/1048576/10                     1818 us         1818 us      
    402 bytes_per_second=550.013M/s null_percent=10 size=1048.58k
   MinMaxKernelInt8/1048576/2                      4039 us         4039 us      
    182 bytes_per_second=247.588M/s null_percent=50 size=1048.58k
   MinMaxKernelInt8/1048576/1                      22.9 us         22.9 us      
  31922 bytes_per_second=42.701G/s null_percent=100 size=1048.58k
   MinMaxKernelInt8/1048576/0                       546 us          546 us      
   1368 bytes_per_second=1.78943G/s null_percent=0 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/10000         2241 us         2241 us      
    325 bytes_per_second=446.241M/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/100           2497 us         2497 us      
    299 bytes_per_second=400.494M/s null_percent=1 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/10            3144 us         3143 us      
    237 bytes_per_second=318.129M/s null_percent=10 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/2             5107 us         5107 us      
    100 bytes_per_second=195.815M/s null_percent=50 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/1             1581 us         1581 us      
    386 bytes_per_second=632.705M/s null_percent=100 size=1048.58k
   MinMaxKernelInt8Aggregate/1048576/0             2093 us         2093 us      
    274 bytes_per_second=477.747M/s null_percent=0 size=1048.58k
   MinMaxKernelInt16/1048576/10000                  274 us          274 us      
   2329 bytes_per_second=3.56594G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16/1048576/100                    459 us          459 us      
   1429 bytes_per_second=2.12904G/s null_percent=1 size=1048.58k
   MinMaxKernelInt16/1048576/10                     842 us          842 us      
    698 bytes_per_second=1.1593G/s null_percent=10 size=1048.58k
   MinMaxKernelInt16/1048576/2                     1997 us         1997 us      
    370 bytes_per_second=500.784M/s null_percent=50 size=1048.58k
   MinMaxKernelInt16/1048576/1                     12.1 us         12.1 us      
  60436 bytes_per_second=80.6769G/s null_percent=100 size=1048.58k
   MinMaxKernelInt16/1048576/0                      271 us          271 us      
   2713 bytes_per_second=3.60785G/s null_percent=0 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/10000        1226 us         1226 us      
    615 bytes_per_second=815.951M/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/100          1326 us         1326 us      
    564 bytes_per_second=753.999M/s null_percent=1 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/10           1608 us         1608 us      
    462 bytes_per_second=622M/s null_percent=10 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/2            2700 us         2700 us      
    275 bytes_per_second=370.316M/s null_percent=50 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/1             867 us          866 us      
    868 bytes_per_second=1.12704G/s null_percent=100 size=1048.58k
   MinMaxKernelInt16Aggregate/1048576/0            1190 us         1190 us      
    620 bytes_per_second=840.448M/s null_percent=0 size=1048.58k
   MinMaxKernelInt32/1048576/10000                  139 us          139 us      
   4389 bytes_per_second=7.03596G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32/1048576/100                    238 us          238 us      
   2483 bytes_per_second=4.10169G/s null_percent=1 size=1048.58k
   MinMaxKernelInt32/1048576/10                     515 us          515 us      
   1000 bytes_per_second=1.89702G/s null_percent=10 size=1048.58k
   MinMaxKernelInt32/1048576/2                     1021 us         1021 us      
    722 bytes_per_second=979.116M/s null_percent=50 size=1048.58k
   MinMaxKernelInt32/1048576/1                     6.55 us         6.55 us      
 109640 bytes_per_second=149.127G/s null_percent=100 size=1048.58k
   MinMaxKernelInt32/1048576/0                      132 us          132 us      
   4723 bytes_per_second=7.4224G/s null_percent=0 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/10000         631 us          631 us      
   1171 bytes_per_second=1.54789G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/100           703 us          703 us      
   1096 bytes_per_second=1.38954G/s null_percent=1 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/10            808 us          808 us      
    911 bytes_per_second=1.20892G/s null_percent=10 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/2            1304 us         1304 us      
    564 bytes_per_second=766.908M/s null_percent=50 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/1             420 us          420 us      
   1758 bytes_per_second=2.32421G/s null_percent=100 size=1048.58k
   MinMaxKernelInt32Aggregate/1048576/0             624 us          624 us      
   1183 bytes_per_second=1.56476G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64/1048576/10000                 73.8 us         73.8 us      
   9720 bytes_per_second=13.2297G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64/1048576/100                    113 us          113 us      
   5369 bytes_per_second=8.65305G/s null_percent=1 size=1048.58k
   MinMaxKernelInt64/1048576/10                     206 us          206 us      
   3134 bytes_per_second=4.74316G/s null_percent=10 size=1048.58k
   MinMaxKernelInt64/1048576/2                      516 us          516 us      
   1000 bytes_per_second=1.89125G/s null_percent=50 size=1048.58k
   MinMaxKernelInt64/1048576/1                     3.89 us         3.89 us      
 187314 bytes_per_second=250.863G/s null_percent=100 size=1048.58k
   MinMaxKernelInt64/1048576/0                     71.5 us         71.5 us      
  10264 bytes_per_second=13.6533G/s null_percent=0 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/10000         305 us          305 us      
   2414 bytes_per_second=3.19811G/s null_percent=0.01 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/100           334 us          334 us      
   2210 bytes_per_second=2.92146G/s null_percent=1 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/10            407 us          407 us      
   1832 bytes_per_second=2.40181G/s null_percent=10 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/2             654 us          654 us      
   1117 bytes_per_second=1.49329G/s null_percent=50 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/1             217 us          217 us      
   3416 bytes_per_second=4.50848G/s null_percent=100 size=1048.58k
   MinMaxKernelInt64Aggregate/1048576/0             302 us          302 us      
   2430 bytes_per_second=3.23243G/s null_percent=0 size=1048.58k
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to