Re: MaterializedViewRule is not applied for queries on aggregate views that truncate timestamp

Thomas D'Silva Wed, 02 Feb 2022 19:47:11 -0800

Hi Stamatis,

Thanks for your response. Your suggestion of having use case specific rules
that translate time ranges that align on the rollup of the materialized
view makes sense.
However if the time range doesn't align for eg. if we have a view that
rolls up to the minute and the query predicate is "WHERE  ts >=
timestamp'2018-01-01 00:01:30' AND ts < timestamp'2018-01-01 00:05:00'" we
wouldn't be able to rewrite it to apply the FLOOR() on the left side of the
comparators.
In this case I was trying to use the MaterializedViewRule's union rewriting
feature to query the view for the range "ts >= timestamp'2018-01-01
00:02:00' AND ts < timestamp'2018-01-01 00:05:00'" and query the table for
the range "ts >= timestamp'2018-01-01 00:01:30' AND ts <
timestamp'2018-01-01 00:02:00'". I would appreciate any suggestions on how
to do this.


Thank you,
Thomas


On Wed, Feb 2, 2022 at 3:28 AM Stamatis Zampetakis <[email protected]>
wrote:

> Hi Thomas,
>
> I haven't looked into the implementation details of the rule but I assume
> the rewriting does not kick in because there is no easy way to relate the
> "ts" column that is used in the query with the "floor(ts to minute)" column
> that is materialized.
>
> I assume that if your query is/was similar to the one below you would get a
> rewriting.
>
> SELECT   floor(ts to minute), count(*)
> FROM     events
> WHERE   floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts
> to minute) < timestamp'2018-01-01 00:05:00'
> GROUP BY eventid, floor(ts TO minute)
>
> So far I don't see a limitation/problem with the MaterializedViewRule but
> rather use-case specific rules you imply they always hold.
>
> If I understand correctly you say that the following are kind of
> equivalent:
>
> WHERE ts >= timestamp'2018-01-01 00:01:00' AND ts <
> timestamp'2018-01-01 00:05:00'
> WHERE floor(ts to minute) >= timestamp'2018-01-01 00:01:00' AND floor(ts to
> minute) < timestamp'2018-01-01 00:05:00'
> WHERE floor(ts to minute) >= floor(timestamp'2018-01-01 00:01:00' to
> minute) AND floor(ts to minute) < floor(timestamp'2018-01-01 00:05:00' to
> minute)
>
> and I think they are based on the given timestamp literals.
>
> Maybe instead of changing the rules performing the view rewriting what you
> should do is introduce other use-case specific rules that can be applied to
> the input queries and bring them to the desired form.
>
> Best,
> Stamatis
>
> On Fri, Jan 28, 2022 at 10:12 PM Thomas D'Silva <[email protected]
> >
> wrote:
>
> > Xurenhe,
> >
> > Thank you for the detailed response and pointers. The use case you
> > describe for OLAP streaming queries is similar to what our team is trying
> > to accomplish.
> > We use calcite to query data stored in kudu [1]. We use kafka streams to
> > maintain materialized views that contain pre-aggregated data that rolls
> > data up to the daily or hourly granularity. One of the problems we face
> is
> > when a user submits a query where the time range does not align to the
> day
> > or hour boundary.  In this case we are unable to use the materialized
> views
> > and have to query the raw event which is not performant because the table
> > contains many more rows compared to the view.
> > The reason I wanted to modify the MaterializedViewRule is to use the
> "Union
> > rewriting with aggregate" ability [2] this would allow us to use both the
> > view and the table to answer such queries in the most efficient manner. I
> > looked at the code in SubstitutionVisitor but I don't think there are
> > existing rules there that do union rewriting with predicates.
> >
> > The following is an example of the query we are trying to optimize. For
> the
> > materialized view
> > SELECT   eventid,  floor(ts to minute), count(*) as cnt
> > FROM     events
> > GROUP BY eventid, floor(ts TO minute)
> >
> > and query
> > SELECT floor(ts to minute), count(*)
> > FROM     events
> > WHERE    ts >= timestamp'2018-01-01 00:01:30' AND ts <
> timestamp'2018-01-01
> > 00:05:00'
> > GROUP BY eventid, floor(ts TO minute)
> >
> > We would like to generate a plan that queries the table for the time
> range
> > [2018-01-01 00:01:30, 2018-01-01 00:02:00) and view for the time range
> > [2018-01-01 00:02:-0, 2018-01-01 00:05:00) similar to
> >
> > EnumerableCalc(expr#0..2=[{inputs}], EXPR$0=[$t1], EXPR$1=[$t2])
> >   EnumerableAggregate(group=[{0, 1}], EXPR$1=[$SUM0($2)])
> >     EnumerableUnion(all=[true])
> >       EnumerableAggregate(group=[{0, 1}], EXPR$1=[COUNT()])
> >         EnumerableCalc(expr#0..1=[{inputs}], expr#2=[FLAG(MINUTE)],
> > expr#3=[FLOOR($t1, $t2)], expr#4=[Sarg[[2018-01-01 00:01:30..2018-01-01
> > 00:02:00)]], expr#5=[SEARCH($t1, $t4)], eventid=[$t0], $f1=[$t3],
> > $condition=[$t5])
> >           EnumerableTableScan(table=[[hr, events]])
> >       EnumerableCalc(expr#0..2=[{inputs}], expr#3=[Sarg[[2018-01-01
> > 00:02:00..2018-01-01 00:05:00)]], expr#4=[SEARCH($t1, $t3)],
> > proj#0..2=[{exprs}], $condition=[$t4])
> >         EnumerableTableScan(table=[[hr, MV0]])
> >
> > Thank you,
> > Thomas
> >
> > [1]
> https://urldefense.com/v3/__https://github.com/twilio/calcite-kudu__;!!NCc8flgU!KhDSBwwmEP1lp8FVI18hL250_jEF94FehJiaGVmrZIs3Q79JqY5_37EGNyWrpN8$
> > [2]
> >
> >
> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html*union-rewriting-with-aggregate__;Iw!!NCc8flgU!KhDSBwwmEP1lp8FVI18hL250_jEF94FehJiaGVmrZIs3Q79JqY5_37EGqqSl83w$
> >
> > On Fri, Jan 28, 2022 at 12:20 AM 徐仁和 <[email protected]> wrote:
> >
> > > It’s useful in OLAP.
> > > Time ’s prediction bottom `Aggregate` could be pulled-up, because of
> > Time’s
> > > granularity in Aggregate’s group.
> > > By the way, the query can be rewritten by mv, when the query's
> > granularity
> > > is coarser than mv’s.
> > > eg:
> > > -- mv:
> > > SELECT   eventid,  floor(ts TO minute), count(*) as cnt
> > > FROM     events
> > > GROUP BY eventid, floor(ts TO minute)
> > > -- query:
> > > SELECT   floor(ts TO hour), count(*)
> > > FROM     events
> > > GROUP BY eventid, floor(ts TO hour)
> > >
> > > In the past, I tried to use stream-sql’s expression to solve it.
> > > This document is used to describe this problem, Here is a rough
> document.
> > > doc:
> > >
> > >
> >
> https://urldefense.com/v3/__https://docs.google.com/document/d/1LtSgxhwvnpk2uAXFQiMSX-aQjnkJ6KLrXwXJX-VbsbE/edit*heading=h.ww6moavc8lkj__;Iw!!NCc8flgU!Lx31-Z0cEnGqJQwX-7zk1Kk3NvAaCblUwfyTg4jDUg0Y69SULj4jHJKPGC2W58A$
> > > My team has defined more `UnifyRule`s to solve it,
> > > because SubstitutionVisitor has high-scalability.
> > >
> > >
> > > Xurenhe
> > >
> > > Thomas D'Silva <[email protected]> 于2022年1月27日周四 02:54写道：
> > >
> > > > Hi,
> > > >
> > > > The existing MaterializedViewRule is not applied when the
> materialized
> > > view
> > > > does not have a view predicate but the query contains a predicate.
> For
> > eg
> > > > for the following materialized view
> > > >
> > > > SELECT   eventid,  floor(ts to minute), count(*) as cnt
> > > > FROM     events
> > > > GROUP BY eventid, floor(ts TO minute)
> > > >
> > > > If we have the following query the view is not chosen in the final
> > plan.
> > > >
> > > > SELECT   floor(ts to minute), count(*)
> > > > FROM     events
> > > > WHERE    ts >= timestamp'2018-01-01 00:01:00' AND ts <
> > > timestamp'2018-01-01
> > > > 00:05:00'
> > > > GROUP BY eventid, floor(ts TO minute)
> > > >
> > > > If I add the predicate "ts > timestamp'2018-01-01 00:01:00' AND ts <
> > > > timestamp'2018-01-01 00:05:00'" to the materialized view then the
> > > > materialized view is chosen in the final plan. Would it be possible
> to
> > > > modify the MaterializedViewRule to automatically add a view predicate
> > > based
> > > > on the query when the materialized view being considerent aggregates
> > > over a
> > > > column that contains a floor()? This predicate would also need to be
> > > > applied to the rewritten view. I would like to file a JIRA and work
> on
> > > this
> > > > improvement if this makes sense.
> > > >
> > > > Thank you,
> > > > Thomas
> > > >
> > >
> >
>

Re: MaterializedViewRule is not applied for queries on aggregate views that truncate timestamp

Reply via email to