Mohammed-Karim226 opened a new issue, #14867:
URL: https://github.com/apache/pinot/issues/14867
## Feature Description
This feature introduces real-time pre-aggregation support for:
1. *Distinct Count HLL (HyperLogLog)*: Allows counting unique values using
the HLL algorithm, which is useful for approximate distinct counts in large
datasets.
2. *Sum Precision (Big Decimal)*: Enables pre-aggregation of large decimal
values with a fixed precision, ensuring accurate calculations for financial or
scientific data.
## Motivation
- *Distinct Count HLL*: Provides a scalable way to count unique values in
real-time, which is critical for analytics and monitoring use cases.
- *Sum Precision*: Ensures accurate aggregation of large decimal values,
which is essential for financial calculations and scientific computations.
## Proposed Solution
- Add support for DISTINCTCOUNTHLL and SUMPRECISION aggregation functions in
real-time tables.
- Use FixedByteSVMutableForwardIndex to store pre-aggregated values
efficiently.
- Include unit tests to validate the new functionality.
## Alternatives Considered
- Using existing aggregation functions with custom logic, but this would not
be as efficient or scalable.
- Storing raw data and performing aggregation at query time, but this would
increase query latency.
## Additional Context
- Example configurations for real-time tables:
```json
"aggregationConfigs": [
{
"columnName": "distinctcounthll_customer",
"aggregationFunction": "DISTINCTCOUNTHLL(customer, 12)"
},
{
"columnName": "sum_precision_parsed_amount",
"aggregationFunction": "SUMPRECISION(parsed_amount, 38)"
}
]
Schema definitions for the new fields:
json
"metricFieldSpecs": [
{
"name": "distinctcounthll_customer",
"dataType": "BYTES"
},
{
"name": "sum_precision_parsed_amount",
"dataType": "BIG_DECIMAL"
}
]
## Are you willing to submit a PR?
No
## Code of conduct
Yes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]