Re: Proposal to extend Calcite into a incremental query optimizer

Botong Huang Wed, 23 Dec 2020 20:56:14 -0800

Thanks Aron for pointing this out. To see the figure, please refer to Fig
3(a) in our paper: https://kai-zeng.github.io/papers/tempura-vldb2021.pdf


Best,
Botong

On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <taojia...@gmail.com> wrote:

> Seems interesting, the pic can not be seen in the mail, may you open a JIRA
> for this, people who are interested in this can subscribe to the JIRA?
>
>
> Regards!
>
> Aron Tao
>
>
> Botong Huang <bot...@apache.org> 于2020年12月24日周四 上午3:18写道：
>
> > Hi all,
> >
> > This is a proposal to extend the Calcite optimizer into a general
> > incremental query optimizer, based on our research paper published in
> VLDB
> > 2021:
> > Tempura: a general cost-based optimizer framework for incremental data
> > processing
> >
> > We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data
> > warehouse is planning to use this incremental query optimizer to
> alleviate
> > cluster-wise resource skewness:
> > Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental
> Computing
> >
> > To our best knowledge, this is the first general cost-based incremental
> > optimizer that can find the best plan across multiple families of
> > incremental computing methods, including IVM, Streaming, DBToaster, etc.
> > Experiments (in the paper) shows that the generated best plan is
> > consistently much better than the plans from each individual method
> alone.
> >
> > In general, incremental query planning is central to database view
> > maintenance and stream processing systems, and are being adopted in
> active
> > databases, resumable query execution, approximate query processing, etc.
> We
> > are hoping that this feature can help widening the spectrum of Calcite,
> > solicit more use cases and adoption of Calcite.
> >
> > Below is a brief description of the technical details. Please refer to
> the
> > Tempura paper for more details. We are also working on a journal version
> of
> > the paper with more implementation details.
> >
> > Currently the query plan generated by Calcite is meant to be executed
> > altogether at once. In the proposal, Calcite’s memo will be extended with
> > temporal information so that it is capable of generating incremental
> plans
> > that include multiple sub-plans to execute at different time points.
> >
> > The main idea is to view each table as one that changes over time (Time
> > Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into
> > Calcite’s memo besides RelSet and RelSubset to track related RelSets of a
> > changing table (e.g. snapshot of the table at certain time, delta of the
> > table between two time points, etc.).
> >
> > [image: image.png]
> >
> > For example in the above figure, each vertical line is a TvrMetaSet
> > representing a TVR (S, R, S left outer join R, etc.). Horizontal lines
> > represent time. Each black dot in the grid is a RelSet. Users can write
> TVR
> > Rewrite Rules to describe valid transformations between these dots. For
> > example, the blues lines are inter-TVR rules that describe how to compute
> > certain RelSet of a TVR from RelSets of other TVRs. The red lines are
> > intra-TVR rules that describe transformations within a TVR. All TVR
> rewrite
> > rules are logical rules. All existing Calcite rules still work in the new
> > volcano system without modification.
> >
> > All changes in this feature will consist of four parts:
> > 1. Memo extension with TvrMetaSet
> > 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as
> > well as links in between the nodes.
> > 3. A basic set of TvrRules, written using the upgraded rule engine API.
> > 4. Multi-query optimization, used to find the best incremental plan
> > involving multiple time points.
> >
> > Note that this feature is an extension in nature and thus when disabled,
> > does not change any existing Calcite behavior.
> >
> > Other than scenarios in the paper, we also applied this Calcite-extended
> > incremental query optimizer to a type of periodic query called the
> ‘‘range
> > query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on
> > total CPU and memory consumption, and 60% on end-to-end execution time.
> >
> > All comments and suggestions are welcome. Thanks and happy holidays!
> >
> > Best,
> > Botong
> >
>

Re: Proposal to extend Calcite into a incremental query optimizer

Reply via email to