These kinds of roll ups are very common. They have simple mathematical properties. There is a simplifying abstraction to be found here. Please find it and use it.
Julian > On Feb 2, 2022, at 7:47 PM, Thomas D'Silva <tdsi...@twilio.com.invalid> wrote: > > Hi Stamatis, > > Thanks for your response. Your suggestion of having use case specific rules > that translate time ranges that align on the rollup of the materialized > view makes sense. > However if the time range doesn't align for eg. if we have a view that > rolls up to the minute and the query predicate is "WHERE ts >= > timestamp'2018-01-01 00:01:30' AND ts < timestamp'2018-01-01 00:05:00'" we > wouldn't be able to rewrite it to apply the FLOOR() on the left side of the > comparators. > In this case I was trying to use the MaterializedViewRule's union rewriting > feature to query the view for the range "ts >= timestamp'2018-01-01 > 00:02:00' AND ts < timestamp'2018-01-01 00:05:00'" and query the table for > the range "ts >= timestamp'2018-01-01 00:01:30' AND ts < > timestamp'2018-01-01 00:02:00'". I would appreciate any suggestions on how > to do this. > > Thank you, > Thomas > > >> On Wed, Feb 2, 2022 at 3:28 AM Stamatis Zampetakis <zabe...@gmail.com> >> wrote: >> >> Hi Thomas, >> >> I haven't looked into the implementation details of the rule but I assume >> the rewriting does not kick in because there is no easy way to relate the >> "ts" column that is used in the query with the "floor(ts to minute)" column >> that is materialized. >> >> I assume that if your query is/was similar to the one below you would get a >> rewriting. >> >> SELECT floor(ts to minute), count(*) >> FROM events >> WHERE floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts >> to minute) < timestamp'2018-01-01 00:05:00' >> GROUP BY eventid, floor(ts TO minute) >> >> So far I don't see a limitation/problem with the MaterializedViewRule but >> rather use-case specific rules you imply they always hold. >> >> If I understand correctly you say that the following are kind of >> equivalent: >> >> WHERE ts >= timestamp'2018-01-01 00:01:00' AND ts < >> timestamp'2018-01-01 00:05:00' >> WHERE floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts to >> minute) < timestamp'2018-01-01 00:05:00' >> WHERE floor(ts to minute) >= floor(timestamp'2018-01-01 00:01:00' to >> minute) AND floor(ts to minute) < floor(timestamp'2018-01-01 00:05:00' to >> minute) >> >> and I think they are based on the given timestamp literals. >> >> Maybe instead of changing the rules performing the view rewriting what you >> should do is introduce other use-case specific rules that can be applied to >> the input queries and bring them to the desired form. >> >> Best, >> Stamatis >> >> On Fri, Jan 28, 2022 at 10:12 PM Thomas D'Silva <tdsi...@twilio.com.invalid >>> >> wrote: >> >>> Xurenhe, >>> >>> Thank you for the detailed response and pointers. The use case you >>> describe for OLAP streaming queries is similar to what our team is trying >>> to accomplish. >>> We use calcite to query data stored in kudu [1]. We use kafka streams to >>> maintain materialized views that contain pre-aggregated data that rolls >>> data up to the daily or hourly granularity. One of the problems we face >> is >>> when a user submits a query where the time range does not align to the >> day >>> or hour boundary. In this case we are unable to use the materialized >> views >>> and have to query the raw event which is not performant because the table >>> contains many more rows compared to the view. >>> The reason I wanted to modify the MaterializedViewRule is to use the >> "Union >>> rewriting with aggregate" ability [2] this would allow us to use both the >>> view and the table to answer such queries in the most efficient manner. I >>> looked at the code in SubstitutionVisitor but I don't think there are >>> existing rules there that do union rewriting with predicates. >>> >>> The following is an example of the query we are trying to optimize. For >> the >>> materialized view >>> SELECT eventid, floor(ts to minute), count(*) as cnt >>> FROM events >>> GROUP BY eventid, floor(ts TO minute) >>> >>> and query >>> SELECT floor(ts to minute), count(*) >>> FROM events >>> WHERE ts >= timestamp'2018-01-01 00:01:30' AND ts < >> timestamp'2018-01-01 >>> 00:05:00' >>> GROUP BY eventid, floor(ts TO minute) >>> >>> We would like to generate a plan that queries the table for the time >> range >>> [2018-01-01 00:01:30, 2018-01-01 00:02:00) and view for the time range >>> [2018-01-01 00:02:-0, 2018-01-01 00:05:00) similar to >>> >>> EnumerableCalc(expr#0..2=[{inputs}], EXPR$0=[$t1], EXPR$1=[$t2]) >>> EnumerableAggregate(group=[{0, 1}], EXPR$1=[$SUM0($2)]) >>> EnumerableUnion(all=[true]) >>> EnumerableAggregate(group=[{0, 1}], EXPR$1=[COUNT()]) >>> EnumerableCalc(expr#0..1=[{inputs}], expr#2=[FLAG(MINUTE)], >>> expr#3=[FLOOR($t1, $t2)], expr#4=[Sarg[[2018-01-01 00:01:30..2018-01-01 >>> 00:02:00)]], expr#5=[SEARCH($t1, $t4)], eventid=[$t0], $f1=[$t3], >>> $condition=[$t5]) >>> EnumerableTableScan(table=[[hr, events]]) >>> EnumerableCalc(expr#0..2=[{inputs}], expr#3=[Sarg[[2018-01-01 >>> 00:02:00..2018-01-01 00:05:00)]], expr#4=[SEARCH($t1, $t3)], >>> proj#0..2=[{exprs}], $condition=[$t4]) >>> EnumerableTableScan(table=[[hr, MV0]]) >>> >>> Thank you, >>> Thomas >>> >>> [1] >> https://urldefense.com/v3/__https://github.com/twilio/calcite-kudu__;!!NCc8flgU!KhDSBwwmEP1lp8FVI18hL250_jEF94FehJiaGVmrZIs3Q79JqY5_37EGNyWrpN8$ >>> [2] >>> >>> >> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html*union-rewriting-with-aggregate__;Iw!!NCc8flgU!KhDSBwwmEP1lp8FVI18hL250_jEF94FehJiaGVmrZIs3Q79JqY5_37EGqqSl83w$ >>> >>>> On Fri, Jan 28, 2022 at 12:20 AM 徐仁和 <xurenhe19910...@gmail.com> wrote: >>> >>>> It’s useful in OLAP. >>>> Time ’s prediction bottom `Aggregate` could be pulled-up, because of >>> Time’s >>>> granularity in Aggregate’s group. >>>> By the way, the query can be rewritten by mv, when the query's >>> granularity >>>> is coarser than mv’s. >>>> eg: >>>> -- mv: >>>> SELECT eventid, floor(ts TO minute), count(*) as cnt >>>> FROM events >>>> GROUP BY eventid, floor(ts TO minute) >>>> -- query: >>>> SELECT floor(ts TO hour), count(*) >>>> FROM events >>>> GROUP BY eventid, floor(ts TO hour) >>>> >>>> In the past, I tried to use stream-sql’s expression to solve it. >>>> This document is used to describe this problem, Here is a rough >> document. >>>> doc: >>>> >>>> >>> >> https://urldefense.com/v3/__https://docs.google.com/document/d/1LtSgxhwvnpk2uAXFQiMSX-aQjnkJ6KLrXwXJX-VbsbE/edit*heading=h.ww6moavc8lkj__;Iw!!NCc8flgU!Lx31-Z0cEnGqJQwX-7zk1Kk3NvAaCblUwfyTg4jDUg0Y69SULj4jHJKPGC2W58A$ >>>> My team has defined more `UnifyRule`s to solve it, >>>> because SubstitutionVisitor has high-scalability. >>>> >>>> >>>> Xurenhe >>>> >>>> Thomas D'Silva <tdsi...@twilio.com.invalid> 于2022年1月27日周四 02:54写道: >>>> >>>>> Hi, >>>>> >>>>> The existing MaterializedViewRule is not applied when the >> materialized >>>> view >>>>> does not have a view predicate but the query contains a predicate. >> For >>> eg >>>>> for the following materialized view >>>>> >>>>> SELECT eventid, floor(ts to minute), count(*) as cnt >>>>> FROM events >>>>> GROUP BY eventid, floor(ts TO minute) >>>>> >>>>> If we have the following query the view is not chosen in the final >>> plan. >>>>> >>>>> SELECT floor(ts to minute), count(*) >>>>> FROM events >>>>> WHERE ts >= timestamp'2018-01-01 00:01:00' AND ts < >>>> timestamp'2018-01-01 >>>>> 00:05:00' >>>>> GROUP BY eventid, floor(ts TO minute) >>>>> >>>>> If I add the predicate "ts > timestamp'2018-01-01 00:01:00' AND ts < >>>>> timestamp'2018-01-01 00:05:00'" to the materialized view then the >>>>> materialized view is chosen in the final plan. Would it be possible >> to >>>>> modify the MaterializedViewRule to automatically add a view predicate >>>> based >>>>> on the query when the materialized view being considerent aggregates >>>> over a >>>>> column that contains a floor()? This predicate would also need to be >>>>> applied to the rewritten view. I would like to file a JIRA and work >> on >>>> this >>>>> improvement if this makes sense. >>>>> >>>>> Thank you, >>>>> Thomas >>>>> >>>> >>> >>