Thanks Aron for pointing this out. To see the figure, please refer to Fig 3(a) in our paper: https://kai-zeng.github.io/papers/tempura-vldb2021.pdf
Best, Botong On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao <taojia...@gmail.com> wrote: > Seems interesting, the pic can not be seen in the mail, may you open a JIRA > for this, people who are interested in this can subscribe to the JIRA? > > > Regards! > > Aron Tao > > > Botong Huang <bot...@apache.org> 于2020年12月24日周四 上午3:18写道: > > > Hi all, > > > > This is a proposal to extend the Calcite optimizer into a general > > incremental query optimizer, based on our research paper published in > VLDB > > 2021: > > Tempura: a general cost-based optimizer framework for incremental data > > processing > > > > We also have a demo in SIGMOD 2020 illustrating how Alibaba’s data > > warehouse is planning to use this incremental query optimizer to > alleviate > > cluster-wise resource skewness: > > Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental > Computing > > > > To our best knowledge, this is the first general cost-based incremental > > optimizer that can find the best plan across multiple families of > > incremental computing methods, including IVM, Streaming, DBToaster, etc. > > Experiments (in the paper) shows that the generated best plan is > > consistently much better than the plans from each individual method > alone. > > > > In general, incremental query planning is central to database view > > maintenance and stream processing systems, and are being adopted in > active > > databases, resumable query execution, approximate query processing, etc. > We > > are hoping that this feature can help widening the spectrum of Calcite, > > solicit more use cases and adoption of Calcite. > > > > Below is a brief description of the technical details. Please refer to > the > > Tempura paper for more details. We are also working on a journal version > of > > the paper with more implementation details. > > > > Currently the query plan generated by Calcite is meant to be executed > > altogether at once. In the proposal, Calcite’s memo will be extended with > > temporal information so that it is capable of generating incremental > plans > > that include multiple sub-plans to execute at different time points. > > > > The main idea is to view each table as one that changes over time (Time > > Varying Relations (TVR)). To achieve that we introduced TvrMetaSet into > > Calcite’s memo besides RelSet and RelSubset to track related RelSets of a > > changing table (e.g. snapshot of the table at certain time, delta of the > > table between two time points, etc.). > > > > [image: image.png] > > > > For example in the above figure, each vertical line is a TvrMetaSet > > representing a TVR (S, R, S left outer join R, etc.). Horizontal lines > > represent time. Each black dot in the grid is a RelSet. Users can write > TVR > > Rewrite Rules to describe valid transformations between these dots. For > > example, the blues lines are inter-TVR rules that describe how to compute > > certain RelSet of a TVR from RelSets of other TVRs. The red lines are > > intra-TVR rules that describe transformations within a TVR. All TVR > rewrite > > rules are logical rules. All existing Calcite rules still work in the new > > volcano system without modification. > > > > All changes in this feature will consist of four parts: > > 1. Memo extension with TvrMetaSet > > 2. Rule engine upgrade, capable of matching TvrMetaSet and RelNodes, as > > well as links in between the nodes. > > 3. A basic set of TvrRules, written using the upgraded rule engine API. > > 4. Multi-query optimization, used to find the best incremental plan > > involving multiple time points. > > > > Note that this feature is an extension in nature and thus when disabled, > > does not change any existing Calcite behavior. > > > > Other than scenarios in the paper, we also applied this Calcite-extended > > incremental query optimizer to a type of periodic query called the > ‘‘range > > query’’ in Alibaba’s data warehouse. It achieved cost savings of 80% on > > total CPU and memory consumption, and 60% on end-to-end execution time. > > > > All comments and suggestions are welcome. Thanks and happy holidays! > > > > Best, > > Botong > > >