Re: Support for Rewriting Count Distinct in Group By Columns for Materialized Views in Calcite

Julian Hyde Thu, 08 Aug 2024 17:14:54 -0700

I certainly agree that that query should use that MV. If it does not currently, 
please log a jira case.


Since c2 is one of the GROUP BY columns in the query, I think it would be also 
be valid (and probably more efficient) to rewrite to

  select case when c2 is null then 0 else 1 end from test_mv group by c2, c3

Julian



> On Aug 6, 2024, at 11:13 PM, Da Dash <[email protected]> wrote:
> 
> Hello everyone, recently I have been working on materialized views using 
> Calcite, and in our use case, there are a lot of queries involving 
> CountDistinct.
> And generally, to support rewriting for Count Distinct,  we will always use 
> bitmap. However, recently, I have developed a new capability in Calcite that 
> allows rewriting of Count Distinct queries to read from the materialized view 
> table without the need for bitmap, as long as the Count Distinct is querying 
> the group by columns of the materialized view.
> 
> For example, let's assume we have the following materialized view: 
> 
> ```sql
> CREATE MATERIALIZED VIEW test_mv AS
> SELECT
>    c1, c2, c3, sum(c4)
> FROM
>   t
> GROUP BY
>    c1, c2, c3
> ```
> 
> After the materialized view created, the following query arrives:
> 
> ```sql
> 
> select count(distinct c2) from t group by c2, c3
> 
> ```
> 
> With the capability I've developed, the above query can be rewritten as:
> 
> ```sql
> 
> select count(distinct c2) from test_mv group by c2, c3
> 
> ```
> 
> The rewrite mentioned above, compared to calculating COUNT DISTINCT directly 
> on the original table, will significantly reduce the query time because the 
> materialized view contains a reduced amount of data.
> 
> Is anyone interested in this? I can initiate a Pull Request. :)
> 
> 
> 
> 
> 
> 
>

Re: Support for Rewriting Count Distinct in Group By Columns for Materialized Views in Calcite

Reply via email to