lidavidm commented on pull request #10813:
URL: https://github.com/apache/arrow/pull/10813#issuecomment-887509020
This is just to see how the hash aggregate kernels perform compared to the
dedicated scalar aggregation kernels in the case that there is only one group.
Unfortunately, it's rather terrible. For count:
<details>
```
---------------------------------------------------------------------------------------------------
Benchmark Time CPU
Iterations UserCounters...
---------------------------------------------------------------------------------------------------
CountKernelBenchInt64/1048576/2 3067 ns 3067 ns
457331 bytes_per_second=318.4G/s null_percent=50 size=1048.58k
CountKernelBenchInt64Aggregate/1048576/2 412162 ns 412138 ns
3238 bytes_per_second=2.3695G/s null_percent=50 size=1048.58k
```
</details>
At 2 orders of magnitude slower, the hash aggregate kernel isn't anywhere
near the dedicated scalar one. The scalar kernel essentially just calls
CountSetBits, while the hash aggregate kernel must use VisitSetBitRuns and
index into a length-1 array of counts. Also, a good amount of time (~10% of the
runtime according to perf) is spent just allocating and filling an array of
group IDs to use at the start.
For min_max the story is not so clear. The hash aggregate kernel actually
wins for floats, but loses badly (not as badly as with Count) for integers.
<details>
```
----------------------------------------------------------------------------------------------------
Benchmark Time CPU
Iterations UserCounters...
----------------------------------------------------------------------------------------------------
MinMaxKernelFloat/1048576/10000 981 us 981 us
638 bytes_per_second=1018.91M/s null_percent=0.01 size=1048.58k
MinMaxKernelFloat/1048576/100 1008 us 1008 us
723 bytes_per_second=992.523M/s null_percent=1 size=1048.58k
MinMaxKernelFloat/1048576/10 1062 us 1062 us
561 bytes_per_second=941.703M/s null_percent=10 size=1048.58k
MinMaxKernelFloat/1048576/2 1424 us 1424 us
456 bytes_per_second=702.401M/s null_percent=50 size=1048.58k
MinMaxKernelFloat/1048576/1 6.92 us 6.92 us
105816 bytes_per_second=141.155G/s null_percent=100 size=1048.58k
MinMaxKernelFloat/1048576/0 900 us 900 us
815 bytes_per_second=1111.18M/s null_percent=0 size=1048.58k
MinMaxKernelFloatAggregate/1048576/10000 667 us 667 us
1103 bytes_per_second=1.46325G/s null_percent=0.01 size=1048.58k
MinMaxKernelFloatAggregate/1048576/100 654 us 654 us
924 bytes_per_second=1.49389G/s null_percent=1 size=1048.58k
MinMaxKernelFloatAggregate/1048576/10 765 us 765 us
965 bytes_per_second=1.27599G/s null_percent=10 size=1048.58k
MinMaxKernelFloatAggregate/1048576/2 1267 us 1267 us
585 bytes_per_second=789.267M/s null_percent=50 size=1048.58k
MinMaxKernelFloatAggregate/1048576/1 421 us 421 us
1693 bytes_per_second=2.32129G/s null_percent=100 size=1048.58k
MinMaxKernelFloatAggregate/1048576/0 668 us 668 us
1107 bytes_per_second=1.46147G/s null_percent=0 size=1048.58k
MinMaxKernelDouble/1048576/10000 420 us 420 us
1712 bytes_per_second=2.32776G/s null_percent=0.01 size=1048.58k
MinMaxKernelDouble/1048576/100 465 us 465 us
1412 bytes_per_second=2.10164G/s null_percent=1 size=1048.58k
MinMaxKernelDouble/1048576/10 592 us 592 us
1168 bytes_per_second=1.64947G/s null_percent=10 size=1048.58k
MinMaxKernelDouble/1048576/2 730 us 730 us
1008 bytes_per_second=1.33826G/s null_percent=50 size=1048.58k
MinMaxKernelDouble/1048576/1 4.10 us 4.10 us
177426 bytes_per_second=238.21G/s null_percent=100 size=1048.58k
MinMaxKernelDouble/1048576/0 540 us 540 us
1000 bytes_per_second=1.80829G/s null_percent=0 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/10000 342 us 342 us
2106 bytes_per_second=2.85799G/s null_percent=0.01 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/100 346 us 346 us
2136 bytes_per_second=2.82629G/s null_percent=1 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/10 385 us 385 us
1911 bytes_per_second=2.53959G/s null_percent=10 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/2 631 us 631 us
1163 bytes_per_second=1.54829G/s null_percent=50 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/1 218 us 218 us
3413 bytes_per_second=4.48758G/s null_percent=100 size=1048.58k
MinMaxKernelDoubleAggregate/1048576/0 334 us 334 us
2193 bytes_per_second=2.92247G/s null_percent=0 size=1048.58k
MinMaxKernelInt8/1048576/10000 571 us 571 us
1293 bytes_per_second=1.71088G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8/1048576/100 986 us 986 us
742 bytes_per_second=1014.35M/s null_percent=1 size=1048.58k
MinMaxKernelInt8/1048576/10 1818 us 1818 us
402 bytes_per_second=550.013M/s null_percent=10 size=1048.58k
MinMaxKernelInt8/1048576/2 4039 us 4039 us
182 bytes_per_second=247.588M/s null_percent=50 size=1048.58k
MinMaxKernelInt8/1048576/1 22.9 us 22.9 us
31922 bytes_per_second=42.701G/s null_percent=100 size=1048.58k
MinMaxKernelInt8/1048576/0 546 us 546 us
1368 bytes_per_second=1.78943G/s null_percent=0 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/10000 2241 us 2241 us
325 bytes_per_second=446.241M/s null_percent=0.01 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/100 2497 us 2497 us
299 bytes_per_second=400.494M/s null_percent=1 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/10 3144 us 3143 us
237 bytes_per_second=318.129M/s null_percent=10 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/2 5107 us 5107 us
100 bytes_per_second=195.815M/s null_percent=50 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/1 1581 us 1581 us
386 bytes_per_second=632.705M/s null_percent=100 size=1048.58k
MinMaxKernelInt8Aggregate/1048576/0 2093 us 2093 us
274 bytes_per_second=477.747M/s null_percent=0 size=1048.58k
MinMaxKernelInt16/1048576/10000 274 us 274 us
2329 bytes_per_second=3.56594G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16/1048576/100 459 us 459 us
1429 bytes_per_second=2.12904G/s null_percent=1 size=1048.58k
MinMaxKernelInt16/1048576/10 842 us 842 us
698 bytes_per_second=1.1593G/s null_percent=10 size=1048.58k
MinMaxKernelInt16/1048576/2 1997 us 1997 us
370 bytes_per_second=500.784M/s null_percent=50 size=1048.58k
MinMaxKernelInt16/1048576/1 12.1 us 12.1 us
60436 bytes_per_second=80.6769G/s null_percent=100 size=1048.58k
MinMaxKernelInt16/1048576/0 271 us 271 us
2713 bytes_per_second=3.60785G/s null_percent=0 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/10000 1226 us 1226 us
615 bytes_per_second=815.951M/s null_percent=0.01 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/100 1326 us 1326 us
564 bytes_per_second=753.999M/s null_percent=1 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/10 1608 us 1608 us
462 bytes_per_second=622M/s null_percent=10 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/2 2700 us 2700 us
275 bytes_per_second=370.316M/s null_percent=50 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/1 867 us 866 us
868 bytes_per_second=1.12704G/s null_percent=100 size=1048.58k
MinMaxKernelInt16Aggregate/1048576/0 1190 us 1190 us
620 bytes_per_second=840.448M/s null_percent=0 size=1048.58k
MinMaxKernelInt32/1048576/10000 139 us 139 us
4389 bytes_per_second=7.03596G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32/1048576/100 238 us 238 us
2483 bytes_per_second=4.10169G/s null_percent=1 size=1048.58k
MinMaxKernelInt32/1048576/10 515 us 515 us
1000 bytes_per_second=1.89702G/s null_percent=10 size=1048.58k
MinMaxKernelInt32/1048576/2 1021 us 1021 us
722 bytes_per_second=979.116M/s null_percent=50 size=1048.58k
MinMaxKernelInt32/1048576/1 6.55 us 6.55 us
109640 bytes_per_second=149.127G/s null_percent=100 size=1048.58k
MinMaxKernelInt32/1048576/0 132 us 132 us
4723 bytes_per_second=7.4224G/s null_percent=0 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/10000 631 us 631 us
1171 bytes_per_second=1.54789G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/100 703 us 703 us
1096 bytes_per_second=1.38954G/s null_percent=1 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/10 808 us 808 us
911 bytes_per_second=1.20892G/s null_percent=10 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/2 1304 us 1304 us
564 bytes_per_second=766.908M/s null_percent=50 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/1 420 us 420 us
1758 bytes_per_second=2.32421G/s null_percent=100 size=1048.58k
MinMaxKernelInt32Aggregate/1048576/0 624 us 624 us
1183 bytes_per_second=1.56476G/s null_percent=0 size=1048.58k
MinMaxKernelInt64/1048576/10000 73.8 us 73.8 us
9720 bytes_per_second=13.2297G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64/1048576/100 113 us 113 us
5369 bytes_per_second=8.65305G/s null_percent=1 size=1048.58k
MinMaxKernelInt64/1048576/10 206 us 206 us
3134 bytes_per_second=4.74316G/s null_percent=10 size=1048.58k
MinMaxKernelInt64/1048576/2 516 us 516 us
1000 bytes_per_second=1.89125G/s null_percent=50 size=1048.58k
MinMaxKernelInt64/1048576/1 3.89 us 3.89 us
187314 bytes_per_second=250.863G/s null_percent=100 size=1048.58k
MinMaxKernelInt64/1048576/0 71.5 us 71.5 us
10264 bytes_per_second=13.6533G/s null_percent=0 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/10000 305 us 305 us
2414 bytes_per_second=3.19811G/s null_percent=0.01 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/100 334 us 334 us
2210 bytes_per_second=2.92146G/s null_percent=1 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/10 407 us 407 us
1832 bytes_per_second=2.40181G/s null_percent=10 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/2 654 us 654 us
1117 bytes_per_second=1.49329G/s null_percent=50 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/1 217 us 217 us
3416 bytes_per_second=4.50848G/s null_percent=100 size=1048.58k
MinMaxKernelInt64Aggregate/1048576/0 302 us 302 us
2430 bytes_per_second=3.23243G/s null_percent=0 size=1048.58k
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]