andimiller commented on PR #10288:
URL: https://github.com/apache/pinot/pull/10288#issuecomment-1433839206
> Thanks for the contribution!
>
> I do have a high level question: do you need to use theta sketch to do
single column distinct count? The reason why we didn't add theta sketch support
for star-tree index is because HLL is the better data structure to use if no
set intersect or diff is required
You're correct, we want to query Theta sketches from the StarTree index and
do set operations with them, so I'm hoping this will allow queries like this to
hit the StarTree index:
```sql
SELECT DistinctCountThetaSketch(
users,
'',
'country ="UK"',
'device="Browser"',
'SET_DIFF($1, $2)'
) AS british_non_browser_users FROM user_stats
```
where `users` is a `BYTES` column with serialized theta sketches in
and if it can't handle that we're hoping we can fall back to:
```sql
SELECT
DistinctCountRawThetaSketch(users) where country="UK"
FROM user_stats
```
```sql
SELECT
DistinctCountRawThetaSketch(users) where device="Browser"
FROM user_stats
```
then do the diff in a JVM program
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]