andimiller commented on PR #10288:
URL: https://github.com/apache/pinot/pull/10288#issuecomment-1433839206

   > Thanks for the contribution!
   > 
   > I do have a high level question: do you need to use theta sketch to do 
single column distinct count? The reason why we didn't add theta sketch support 
for star-tree index is because HLL is the better data structure to use if no 
set intersect or diff is required
   
   You're correct, we want to query Theta sketches from the StarTree index and 
do set operations with them, so I'm hoping this will allow queries like this to 
hit the StarTree index:
   
   ```sql
   SELECT DistinctCountThetaSketch(
     users,
     '',
     'country ="UK"',
     'device="Browser"',
     'SET_DIFF($1, $2)'
   ) AS british_non_browser_users FROM user_stats
   ```
   
   where `users` is a `BYTES` column with serialized theta sketches in
   
   and if it can't handle that we're hoping we can fall back to:
   
   ```sql
   SELECT 
     DistinctCountRawThetaSketch(users) where country="UK" 
   FROM user_stats
   ```
   ```sql
   SELECT 
     DistinctCountRawThetaSketch(users) where device="Browser"
   FROM user_stats
   ```
   
   then do the diff in a JVM program
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to