Hi Thomas, I haven't looked into the implementation details of the rule but I assume the rewriting does not kick in because there is no easy way to relate the "ts" column that is used in the query with the "floor(ts to minute)" column that is materialized.
I assume that if your query is/was similar to the one below you would get a rewriting. SELECT floor(ts to minute), count(*) FROM events WHERE floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts to minute) < timestamp'2018-01-01 00:05:00' GROUP BY eventid, floor(ts TO minute) So far I don't see a limitation/problem with the MaterializedViewRule but rather use-case specific rules you imply they always hold. If I understand correctly you say that the following are kind of equivalent: WHERE ts >= timestamp'2018-01-01 00:01:00' AND ts < timestamp'2018-01-01 00:05:00' WHERE floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts to minute) < timestamp'2018-01-01 00:05:00' WHERE floor(ts to minute) >= floor(timestamp'2018-01-01 00:01:00' to minute) AND floor(ts to minute) < floor(timestamp'2018-01-01 00:05:00' to minute) and I think they are based on the given timestamp literals. Maybe instead of changing the rules performing the view rewriting what you should do is introduce other use-case specific rules that can be applied to the input queries and bring them to the desired form. Best, Stamatis On Fri, Jan 28, 2022 at 10:12 PM Thomas D'Silva <[email protected]> wrote: > Xurenhe, > > Thank you for the detailed response and pointers. The use case you > describe for OLAP streaming queries is similar to what our team is trying > to accomplish. > We use calcite to query data stored in kudu [1]. We use kafka streams to > maintain materialized views that contain pre-aggregated data that rolls > data up to the daily or hourly granularity. One of the problems we face is > when a user submits a query where the time range does not align to the day > or hour boundary. In this case we are unable to use the materialized views > and have to query the raw event which is not performant because the table > contains many more rows compared to the view. > The reason I wanted to modify the MaterializedViewRule is to use the "Union > rewriting with aggregate" ability [2] this would allow us to use both the > view and the table to answer such queries in the most efficient manner. I > looked at the code in SubstitutionVisitor but I don't think there are > existing rules there that do union rewriting with predicates. > > The following is an example of the query we are trying to optimize. For the > materialized view > SELECT eventid, floor(ts to minute), count(*) as cnt > FROM events > GROUP BY eventid, floor(ts TO minute) > > and query > SELECT floor(ts to minute), count(*) > FROM events > WHERE ts >= timestamp'2018-01-01 00:01:30' AND ts < timestamp'2018-01-01 > 00:05:00' > GROUP BY eventid, floor(ts TO minute) > > We would like to generate a plan that queries the table for the time range > [2018-01-01 00:01:30, 2018-01-01 00:02:00) and view for the time range > [2018-01-01 00:02:-0, 2018-01-01 00:05:00) similar to > > EnumerableCalc(expr#0..2=[{inputs}], EXPR$0=[$t1], EXPR$1=[$t2]) > EnumerableAggregate(group=[{0, 1}], EXPR$1=[$SUM0($2)]) > EnumerableUnion(all=[true]) > EnumerableAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) > EnumerableCalc(expr#0..1=[{inputs}], expr#2=[FLAG(MINUTE)], > expr#3=[FLOOR($t1, $t2)], expr#4=[Sarg[[2018-01-01 00:01:30..2018-01-01 > 00:02:00)]], expr#5=[SEARCH($t1, $t4)], eventid=[$t0], $f1=[$t3], > $condition=[$t5]) > EnumerableTableScan(table=[[hr, events]]) > EnumerableCalc(expr#0..2=[{inputs}], expr#3=[Sarg[[2018-01-01 > 00:02:00..2018-01-01 00:05:00)]], expr#4=[SEARCH($t1, $t3)], > proj#0..2=[{exprs}], $condition=[$t4]) > EnumerableTableScan(table=[[hr, MV0]]) > > Thank you, > Thomas > > [1] https://github.com/twilio/calcite-kudu > [2] > > https://calcite.apache.org/docs/materialized_views.html#union-rewriting-with-aggregate > > On Fri, Jan 28, 2022 at 12:20 AM 徐仁和 <[email protected]> wrote: > > > It’s useful in OLAP. > > Time ’s prediction bottom `Aggregate` could be pulled-up, because of > Time’s > > granularity in Aggregate’s group. > > By the way, the query can be rewritten by mv, when the query's > granularity > > is coarser than mv’s. > > eg: > > -- mv: > > SELECT eventid, floor(ts TO minute), count(*) as cnt > > FROM events > > GROUP BY eventid, floor(ts TO minute) > > -- query: > > SELECT floor(ts TO hour), count(*) > > FROM events > > GROUP BY eventid, floor(ts TO hour) > > > > In the past, I tried to use stream-sql’s expression to solve it. > > This document is used to describe this problem, Here is a rough document. > > doc: > > > > > https://urldefense.com/v3/__https://docs.google.com/document/d/1LtSgxhwvnpk2uAXFQiMSX-aQjnkJ6KLrXwXJX-VbsbE/edit*heading=h.ww6moavc8lkj__;Iw!!NCc8flgU!Lx31-Z0cEnGqJQwX-7zk1Kk3NvAaCblUwfyTg4jDUg0Y69SULj4jHJKPGC2W58A$ > > My team has defined more `UnifyRule`s to solve it, > > because SubstitutionVisitor has high-scalability. > > > > > > Xurenhe > > > > Thomas D'Silva <[email protected]> 于2022年1月27日周四 02:54写道: > > > > > Hi, > > > > > > The existing MaterializedViewRule is not applied when the materialized > > view > > > does not have a view predicate but the query contains a predicate. For > eg > > > for the following materialized view > > > > > > SELECT eventid, floor(ts to minute), count(*) as cnt > > > FROM events > > > GROUP BY eventid, floor(ts TO minute) > > > > > > If we have the following query the view is not chosen in the final > plan. > > > > > > SELECT floor(ts to minute), count(*) > > > FROM events > > > WHERE ts >= timestamp'2018-01-01 00:01:00' AND ts < > > timestamp'2018-01-01 > > > 00:05:00' > > > GROUP BY eventid, floor(ts TO minute) > > > > > > If I add the predicate "ts > timestamp'2018-01-01 00:01:00' AND ts < > > > timestamp'2018-01-01 00:05:00'" to the materialized view then the > > > materialized view is chosen in the final plan. Would it be possible to > > > modify the MaterializedViewRule to automatically add a view predicate > > based > > > on the query when the materialized view being considerent aggregates > > over a > > > column that contains a floor()? This predicate would also need to be > > > applied to the rewritten view. I would like to file a JIRA and work on > > this > > > improvement if this makes sense. > > > > > > Thank you, > > > Thomas > > > > > >
