Mohammed-Karim226 opened a new issue, #14867:
URL: https://github.com/apache/pinot/issues/14867

   ## Feature Description
   This feature introduces real-time pre-aggregation support for:
   1. *Distinct Count HLL (HyperLogLog)*: Allows counting unique values using 
the HLL algorithm, which is useful for approximate distinct counts in large 
datasets.
   2. *Sum Precision (Big Decimal)*: Enables pre-aggregation of large decimal 
values with a fixed precision, ensuring accurate calculations for financial or 
scientific data.
   
   ## Motivation
   - *Distinct Count HLL*: Provides a scalable way to count unique values in 
real-time, which is critical for analytics and monitoring use cases.
   - *Sum Precision*: Ensures accurate aggregation of large decimal values, 
which is essential for financial calculations and scientific computations.
   
   ## Proposed Solution
   - Add support for DISTINCTCOUNTHLL and SUMPRECISION aggregation functions in 
real-time tables.
   - Use FixedByteSVMutableForwardIndex to store pre-aggregated values 
efficiently.
   - Include unit tests to validate the new functionality.
   
   ## Alternatives Considered
   - Using existing aggregation functions with custom logic, but this would not 
be as efficient or scalable.
   - Storing raw data and performing aggregation at query time, but this would 
increase query latency.
   
   ## Additional Context
   - Example configurations for real-time tables:
     ```json
     "aggregationConfigs": [
       {
         "columnName": "distinctcounthll_customer",
         "aggregationFunction": "DISTINCTCOUNTHLL(customer, 12)"
       },
       {
         "columnName": "sum_precision_parsed_amount",
         "aggregationFunction": "SUMPRECISION(parsed_amount, 38)"
       }
     ]
   Schema definitions for the new fields:
   
   json
   "metricFieldSpecs": [
     {
       "name": "distinctcounthll_customer",
       "dataType": "BYTES"
     },
     {
       "name": "sum_precision_parsed_amount",
       "dataType": "BIG_DECIMAL"
     }
   ]
   
   ## Are you willing to submit a PR?
   No
   
   ## Code of conduct
   Yes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to