xiangfu0 commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-4113973789
The new `filterMv` function in #17659 can help with this. The inflated
DISTINCTCOUNTMV happens because MV explosion during GROUP BY creates a separate
group per element, so each group sees the full MV array. By wrapping the column
with `filterMv`, you restrict which elements participate in both grouping and
aggregation:
```sql
-- Instead of:
-- SELECT tags, DISTINCTCOUNTMV(tags) FROM myTable GROUP BY tags
-- Which returns n instead of 1 per group due to MV explosion
-- Use filterMv to control which elements are grouped and counted:
SELECT filterMv(tags, 'v IN (''a'', ''b'')') AS filtered_tags,
DISTINCTCOUNTMV(filterMv(tags, 'v IN (''a'', ''b'')'))
FROM myTable
GROUP BY filterMv(tags, 'v IN (''a'', ''b'')')
```
The predicate uses `v` as a placeholder for each MV element and supports
`=`, `!=`, `>`, `>=`, `<`, `<=`, `IN`, `NOT IN`, `BETWEEN`, `REGEXP_LIKE`, and
compound predicates with `AND`/`OR`/`NOT`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]