Hello everyone, recently I have been working on materialized views using
Calcite, and in our use case, there are a lot of queries involving
CountDistinct.
And generally, to support rewriting for Count Distinct, we will always use
bitmap. However, recently, I have developed a new capability in Calcite that
allows rewriting of Count Distinct queries to read from the materialized view
table without the need for bitmap, as long as the Count Distinct is querying
the group by columns of the materialized view.
For example, let's assume we have the following materialized view:
```sql
CREATE MATERIALIZED VIEW test_mv AS
SELECT
c1, c2, c3, sum(c4)
FROM
t
GROUP BY
c1, c2, c3
```
After the materialized view created, the following query arrives:
```sql
select count(distinct c2) from t group by c2, c3
```
With the capability I've developed, the above query can be rewritten as:
```sql
select count(distinct c2) from test_mv group by c2, c3
```
The rewrite mentioned above, compared to calculating COUNT DISTINCT directly on
the original table, will significantly reduce the query time because the
materialized view contains a reduced amount of data.
Is anyone interested in this? I can initiate a Pull Request. :)