xiangfu0 opened a new pull request, #18822:
URL: https://github.com/apache/pinot/pull/18822
## Summary
Adds `AVG` to the set of aggregation functions supported by
`MergeRollupTask` and `RealtimeToOfflineSegmentsTask` (rollup merge type).
Previously, configuring `<column>.aggregationType=avg` for a merge/rollup
task failed config validation with `ValueAggregator not enabled for type: AVG`.
## How it works
`AVG` is rolled up the same way as the other bytes-backed types
(`DISTINCTCOUNTHLL`, `PERCENTILETDIGEST`, …): the metric column stores a
serialized `AvgPair` (sum + count) in a `BYTES` column, produced by an `AVG`
ingestion aggregation or star-tree. The rollup reducer merges two `AvgPair`s by
**adding their sums and counts**, so the average stays correct across multiple
rollup levels.
This is necessary because plain `AVG` is not re-aggregatable from a scalar —
averaging already-averaged values is wrong whenever the merged groups have
unequal counts. Carrying `(sum, count)` and dividing only at the end avoids the
average-of-averages trap.
The query-time `AvgAggregationFunction` already reads this exact serialized
`AvgPair` format from `BYTES` columns, so **no query-path change is required**
— this PR only wires up the merge/rollup side.
### Changes
- New `AvgValueAggregator` in the pinot-core segment-processing aggregator
package (merges serialized `AvgPair`s; empty `byte[]` treated as missing,
mirroring the sibling sketch aggregators).
- Register `AVG` in `pinot-core` `ValueAggregatorFactory`.
- Add `AVG` to
`MinionConstants.MergeRollupTask.AVAILABLE_CORE_VALUE_AGGREGATORS` (shared by
both merge tasks).
### Usage requirement
`AVG` merge/rollup operates on a column that already stores a serialized
`AvgPair`, i.e. a `BYTES` metric column produced by an `AVG` ingestion
aggregation (or star-tree). Pointing `aggregationType=avg` at a raw numeric
column is a misconfiguration (same as for HLL/TDigest). A follow-up will add
config-time validation that bytes-backed aggregation types require a `BYTES`
column, uniformly across all such types, so this fails fast at table-config
time rather than at task runtime.
## Backward compatibility / rolling upgrade
- Purely additive to an allow-list and a factory switch; existing configs
are unaffected.
- No new enum constant, SPI signature change, or wire-format change
(`AvgPair` serialization is unchanged and already used at query time).
- Rolling-upgrade note: if a **new** controller schedules an `AVG` merge
task while a minion is still on an **old** version, the old minion's factory
throws `IllegalStateException("Unsupported aggregation type: AVG")` at task
runtime — it fails loudly with no data corruption. Upgrade minions before
enabling `AVG` rollup.
## Testing
- `AvgValueAggregatorTest` (pinot-core): merge adds sum/count, two-level
rollup preserves totals, empty/malformed bytes handling, factory wiring.
- `MergeRollupAvgTaskExecutorTest` (new): end-to-end rollup of a `BYTES`
`AvgPair` column across segments, including **unequal counts** and a true
**two-level** rollup that would fail under average-of-averages.
- `RealtimeToOfflineSegmentsTaskExecutorTest`: added
`testRollupWithAvgAggregation` covering the second consumer end-to-end.
- `MergeRollupTaskGeneratorTest`: updated the parseable-but-unsupported
negative case (it used `avg`, now uses `distinctCount`).
## Release note
`MergeRollupTask` and `RealtimeToOfflineSegmentsTask` now support `AVG`
aggregation on a `BYTES` column storing a serialized `AvgPair` (sum + count).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]