zclllyybb opened a new pull request, #64166:
URL: https://github.com/apache/doris/pull/64166
Count aggregation without GROUP BY reaches
AggFnEvaluator::execute_single_add(), which calls add_batch_single_place().
AggregateFunctionCount and AggregateFunctionCountNotNullUnary previously
inherited the row-by-row helper there, so count(*) and count(nullable_expr)
paid per-row add/is_null_at costs even when all rows were aggregated into one
state.
This patch adds batch implementations: count(*) increments the state once by
batch_size, while unary count(nullable_expr) checks the nullable null map once
and fast-paths the no-NULL case to count += batch_size. When NULLs exist it
uses simd::count_zero_num() over the null map to count non-NULL rows. The
nullable class name is kept because SQL count(expr) counts non-NULL values, not
NULL values.
Performance:
test with sql
```sql
select count(nullable(number)) from numbers("number"="1000000000");
select count(nullable(if(number >= 0, null, number))) from
numbers("number"="1000000000");
select count(nullable(if(number % 2 = 0, number, null))) from
numbers("number"="1000000000");
```
get result
```
Scenario before median / mean after median / mean median diff
━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━
non NULL 645 / 648.6 ms 555 / 556.4 ms -14.0%
─────────── ────────────────────── ───────────────────── ─────────────
all NULL 1541 / 1539.6 ms 1448 / 1450.6 ms -6.0%
─────────── ────────────────────── ───────────────────── ─────────────
half NULL 4256 / 4261.2 ms 4192 / 4232.2 ms -1.5%
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]