Jackie-Jiang opened a new pull request #8074:
URL: https://github.com/apache/pinot/pull/8074


   ## Description
   For `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` aggregation function, currently 
we use `Set` to store all the values, which can cause memory issues and 
potentially exhaust the memory for Servers or Brokers. This PR adds the support 
to automatically convert the `Set` to `HyperLogLog` if the set size grows too 
big to protect the servers. This conversion only applies to aggregation only 
queries, but not the group-by queries.
   
   By default, when the set size exceeds 100K, it will be converted to a 
HyperLogLog with log2m of 12.
   The log2m and threshold can be configured using the second argument 
(literal) of the function:
   - `hllLog2m`: log2m of the converted HyperLogLog (default 12)
   - `hllConversionThreshold`: set size threshold to trigger the conversion, 
non-positive means never convert (default 100K)
   
   Example query:
   `SELECT DISTINCTCOUNT(myCol, 'hllLog2m=8;hllConversionThreshold=10') FROM 
myTable`
   
   ## Release Notes
   Add second argument (literal) to `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` 
aggregation function for optional parameters:
   - `hllLog2m`: log2m of the converted HyperLogLog (default 12)
   - `hllConversionThreshold`: set size threshold to trigger the conversion, 
non-positive means never convert (default 100K)
   
   For `DISTINCT_COUNT` and `DISTINCT_COUNT_MV` aggregation only queries, if 
the result is over 100K, the query will use `HyperLogLog` and return 
approximate result by default. To get back to the 100% accurate behavior, set 
`hllConversionThreshold` to a non-positive value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to