I have added my time preferences to the doc [1]. I am generally available any evening Mon - Thu. How about we meet Monday 10th May?
Stamatis, Jesus, Given the complexity of this work, I would very much appreciate your insight, as experts in optimizer theory. Could one of you join the next meeting? Of course we should choose a time that works for everyone's schedule. Julian [1] https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing On Wed, Apr 28, 2021 at 9:32 AM Botong Huang <pku...@gmail.com> wrote: > > We didn't record it, we will try to record the following meetings. Please > add your time preference in the docs, so that we can find a meeting time > that works for more people. > > Thanks, > Botong > > On Wed, Apr 28, 2021 at 12:23 AM Viliam Durina <vil...@hazelcast.com> wrote: > > > Is there a recording available? > > Viliam > > > > On Wed, 28 Apr 2021 at 00:15, Botong Huang <pku...@gmail.com> wrote: > > > > > Hi all, > > > > > > The meeting yesterday was fun and productive. As discussed, this is the > > > call to schedule our second meeting. > > > > > > We encourage everyone to add their time preferences during 05/01 - 05/15 > > > here: > > > > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > > > > Thanks, > > > Botong > > > > > > On Wed, Apr 21, 2021 at 5:19 PM Botong Huang <pku...@gmail.com> wrote: > > > > > > > Hi all, > > > > We've created a zoom meeting below for our meeting next Monday > > > > (9pm-10:30pm PST on 04/26). > > > > Talk to you all soon! > > > > > > > > Join Zoom Meeting > > > > https://uci.zoom.us/j/91279732686 > > > > < > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fj%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw2C5LoOmCaSLWSi-YvMmsOE > > > > > > > > > > > > Meeting ID: 912 7973 2686 > > > > One tap mobile > > > > +16699006833,,91279732686# US (San Jose) > > > > +12532158782,,91279732686# US (Tacoma) > > > > > > > > Dial by your location > > > > +1 669 900 6833 US (San Jose) > > > > +1 253 215 8782 US (Tacoma) > > > > +1 346 248 7799 US (Houston) > > > > +1 301 715 8592 US (Washington DC) > > > > +1 312 626 6799 US (Chicago) > > > > +1 646 558 8656 US (New York) > > > > Meeting ID: 912 7973 2686 > > > > Find your local number: https://uci.zoom.us/u/aykHTkJBh > > > > < > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fu%2FaykHTkJBh&sa=D&source=calendar&usd=2&usg=AOvVaw0y_V5CisCHRyt9wsXLa9UM > > > > > > > > > > > > Join by Skype for Business > > > > https://uci.zoom.us/skype/91279732686 > > > > < > > > > > https://www.google.com/url?q=https%3A%2F%2Fuci.zoom.us%2Fskype%2F91279732686&sa=D&source=calendar&usd=2&usg=AOvVaw3iQwsDViu3K7-Rb_Iy6Zsy > > > > > > > > > > > > > > > > Thanks, > > > > Botong > > > > > > > > On Tue, Apr 13, 2021 at 10:16 PM Botong Huang <pku...@gmail.com> > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> According to the preferences collected, we are tentatively scheduling > > > our > > > >> meeting at 9pm-10:30pm PST on 04/26 Monday. > > > >> > > > >> We will give a presentation about Tempura, followed by a free > > > discussion. > > > >> > > > >> Please let us know if there are new other requests. Few days before > > > >> the meeting, I will send out a zoom meeting link. > > > >> > > > >> Thanks, > > > >> Botong > > > >> > > > >> On Wed, Apr 7, 2021 at 2:46 PM Botong Huang <pku...@gmail.com> wrote: > > > >> > > > >>> Hi Julian and all, > > > >>> > > > >>> We've posted the Tempura code base below. Feel free to take a quick > > > peek > > > >>> at the last five commits. > > > >>> > > > https://github.com/alibaba/cost-based-incremental-optimizer/commits/main > > > >>> > > > >>> I've also opened a Jira (CALCITE-4568 > > > >>> <https://issues.apache.org/jira/browse/CALCITE-4568>), which will > > > serve > > > >>> as the umbrella Jira for the feature. > > > >>> > > > >>> In the meantime, we encourage everyone to enter the time preferences > > > for > > > >>> our first meeting here: > > > >>> > > > >>> > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > >>> > > > >>> Thanks, > > > >>> Botong > > > >>> > > > >>> On Mon, Apr 5, 2021 at 3:59 PM Julian Hyde <jhyde.apa...@gmail.com> > > > >>> wrote: > > > >>> > > > >>>> I have added my time preferences to the doc. > > > >>>> > > > >>>> Before we meet, could you publish a PR for us to review? > > > >>>> > > > >>>> Initial discussions will need to be about architecture and > > high-level > > > >>>> design. So I would ask Calcite reviewers not to review the PR > > > line-by-line > > > >>>> (or to leave comments in GitHub) but try to understand the design > > > >>>> holistically, and prepare questions/comments before the meeting. > > > >>>> > > > >>>> Botong, Can you please create a Calcite JIRA case for this task? > > JIRA > > > >>>> how we track long-running tasks such as this. > > > >>>> > > > >>>> Julian > > > >>>> > > > >>>> > > > >>>> > On Apr 3, 2021, at 5:15 PM, Botong Huang <pku...@gmail.com> > > wrote: > > > >>>> > > > > >>>> > Hi all, > > > >>>> > > > > >>>> > Apology for the delay. It took us some time to clean up our code > > > base > > > >>>> and > > > >>>> > publicly release it (which will be out soon) for a quick peek. > > > >>>> > > > > >>>> > We are ready to present our work. Let's schedule a time for a Zoom > > > >>>> > meeting and discuss how to integrate Tempura into Calcite. > > > >>>> > > > > >>>> > Since some of our team members are in China, we prefer the time > > slot > > > >>>> of > > > >>>> > 7:00pm-11:30pm PST any day. I've added our time preference in the > > > >>>> shared > > > >>>> > doc below. > > > >>>> > > > > >>>> > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > >>>> > > > > >>>> > We encourage everyone to add their time preferences (during > > > >>>> 04/15-04/30) in > > > >>>> > this doc. In a week or so, we will try to settle a time that works > > > for > > > >>>> > most. > > > >>>> > > > > >>>> > Thanks, > > > >>>> > Botong > > > >>>> > > > > >>>> > On Sat, Jan 30, 2021 at 9:19 PM Botong Huang <pku...@gmail.com> > > > >>>> wrote: > > > >>>> > > > > >>>> >> Hi Julian and Rui, > > > >>>> >> > > > >>>> >> Sounds good to us. Please give us some time to prepare some > > slides > > > >>>> for the > > > >>>> >> meeting. > > > >>>> >> > > > >>>> >> I've created a doc below for discussion. Please feel free to add > > > >>>> more in > > > >>>> >> here: > > > >>>> >> > > > >>>> >> > > > >>>> > > > > > https://docs.google.com/document/d/1wyNjB94uSGwHtVvGYDwaLlCghUJE-7aDLnCdKKXJN1o/edit?usp=sharing > > > >>>> >> > > > >>>> >> Thanks, > > > >>>> >> Botong > > > >>>> >> > > > >>>> >> On Thu, Jan 28, 2021 at 11:18 AM Julian Hyde < > > > jhyde.apa...@gmail.com > > > >>>> > > > > >>>> >> wrote: > > > >>>> >> > > > >>>> >>> PS The “editable doc” that Rui refers to is also a good idea. I > > > >>>> think we > > > >>>> >>> should create it to continue discussion after the first meeting. > > > >>>> >>> > > > >>>> >>> Julian > > > >>>> >>> > > > >>>> >>>> On Jan 28, 2021, at 11:16 AM, Julian Hyde < > > > jhyde.apa...@gmail.com> > > > >>>> >>> wrote: > > > >>>> >>>> > > > >>>> >>>> I think good next steps would be a PR and a meeting. The PR > > will > > > >>>> allow > > > >>>> >>> us to read the code, but I think we should do the first round of > > > >>>> questions > > > >>>> >>> at the meeting. The meeting could perhaps start with a > > > >>>> presentation of the > > > >>>> >>> paper (do you have some slides you are planning to present at > > > VLDB, > > > >>>> >>> Botong?) and then move on to questions about the concepts, which > > > >>>> >>> alternatives were considered, and how the concepts map onto > > other > > > >>>> current > > > >>>> >>> and future concepts in calcite. > > > >>>> >>>> > > > >>>> >>>> I don’t think we should start “reviewing” the PR line-by-line > > at > > > >>>> this > > > >>>> >>> point. We need to understand the high-level concepts and design > > > >>>> choices. If > > > >>>> >>> we start reviewing the PR we will get lost in the details. > > > >>>> >>>> > > > >>>> >>>> I know that integrating a major change is hard; I doubt that we > > > >>>> will be > > > >>>> >>> able to integrate everything, but we can build understanding > > about > > > >>>> where > > > >>>> >>> calcite needs to go, and I hope integrate a good amount of code > > to > > > >>>> help us > > > >>>> >>> get there. > > > >>>> >>>> > > > >>>> >>>> As I said before, after the integration I would like people to > > be > > > >>>> able > > > >>>> >>> to experiment with it and use it in their production systems. > > > That > > > >>>> way, it > > > >>>> >>> will not be an experiment that withers, but a feature set > > > >>>> integrates with > > > >>>> >>> other calcite features and gets stronger over time. > > > >>>> >>>> > > > >>>> >>>> Julian > > > >>>> >>>> > > > >>>> >>>>> On Jan 28, 2021, at 10:54 AM, Rui Wang <amaliu...@apache.org> > > > >>>> wrote: > > > >>>> >>>>> > > > >>>> >>>>> For me to participate in the discussion for the above > > > questions, > > > >>>> I > > > >>>> >>> will > > > >>>> >>>>> need to read a lot more to know relevant context and likely > > ask > > > >>>> lots of > > > >>>> >>>>> questions :-). A editable doc is probably good for questions > > > and > > > >>>> back > > > >>>> >>> and > > > >>>> >>>>> forward discussion. > > > >>>> >>>>> > > > >>>> >>>>> > > > >>>> >>>>> -Rui > > > >>>> >>>>> > > > >>>> >>>>>>> On Thu, Jan 28, 2021 at 10:50 AM Rui Wang < > > > amaliu...@apache.org > > > >>>> > > > > >>>> >>> wrote: > > > >>>> >>>>>> > > > >>>> >>>>>> I am also happy to help push this work into Calcite (review > > > code > > > >>>> and > > > >>>> >>> doc, > > > >>>> >>>>>> etc.). > > > >>>> >>>>>> > > > >>>> >>>>>> While you can share your code so people can have more idea > > how > > > >>>> it is > > > >>>> >>>>>> implemented, I think it would be also nice to have a doc to > > > >>>> discuss > > > >>>> >>> open > > > >>>> >>>>>> questions above. Some points that I copy those to here: > > > >>>> >>>>>> > > > >>>> >>>>>> 1. Can this solution be compatible with existing solutions in > > > >>>> Calcite > > > >>>> >>>>>> Streaming, materialized view maintenance, and multi-query > > > >>>> optimization > > > >>>> >>>>>> (Sigma and Delta relational operators, lattice, and Spool > > > >>>> operator), > > > >>>> >>>>>> 2. Did you find that you needed two separate cost models - > > one > > > >>>> for > > > >>>> >>> “view > > > >>>> >>>>>> maintenance” and another for “user queries” - since the > > > >>>> objectives of > > > >>>> >>> each > > > >>>> >>>>>> activity are so different? > > > >>>> >>>>>> 3. whether this work will hasten the arrival of > > multi-objective > > > >>>> >>> parametric > > > >>>> >>>>>> query optimization [1] in Calcite. > > > >>>> >>>>>> 4. probably SQL shell support. > > > >>>> >>>>>> > > > >>>> >>>>>> > > > >>>> >>>>>> [1]: > > > >>>> >>>>>> > > > >>>> >>> > > > >>>> > > > > > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext > > > >>>> >>>>>> > > > >>>> >>>>>> > > > >>>> >>>>>> -Rui > > > >>>> >>>>>> > > > >>>> >>>>>> > > > >>>> >>>>>> > > > >>>> >>>>>>> On Wed, Jan 27, 2021 at 6:52 PM Albert <zinki...@gmail.com> > > > >>>> wrote: > > > >>>> >>>>>>> > > > >>>> >>>>>>> it would be very nice to see a POC of your work. > > > >>>> >>>>>>> > > > >>>> >>>>>>> > > > >>>> >>>>>>>> On Thu, Jan 28, 2021 at 10:21 AM Botong Huang < > > > >>>> pku...@gmail.com> > > > >>>> >>> wrote: > > > >>>> >>>>>>> > > > >>>> >>>>>>>> Hi Julian, > > > >>>> >>>>>>>> > > > >>>> >>>>>>>> Just wondering if there are any updates? We are wondering > > if > > > it > > > >>>> >>> would > > > >>>> >>>>>>> help > > > >>>> >>>>>>>> to post our code for a quick preview. > > > >>>> >>>>>>>> > > > >>>> >>>>>>>> Thanks, > > > >>>> >>>>>>>> Botong > > > >>>> >>>>>>>> > > > >>>> >>>>>>>> On Fri, Jan 1, 2021 at 11:04 AM Botong Huang < > > > pku...@gmail.com > > > >>>> > > > > >>>> >>> wrote: > > > >>>> >>>>>>>> > > > >>>> >>>>>>>>> Hi Julian, > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> Thanks for your interest! Sure let's figure out a plan > > that > > > >>>> best > > > >>>> >>>>>>> benefits > > > >>>> >>>>>>>>> the community. Here are some clarifications that hopefully > > > >>>> answer > > > >>>> >>> your > > > >>>> >>>>>>>>> questions. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> In our work (Tempura), users specify the set of time > > points > > > to > > > >>>> >>>>>>> consider > > > >>>> >>>>>>>>> running and a cost function that expresses users' > > preference > > > >>>> over > > > >>>> >>>>>>> time, > > > >>>> >>>>>>>>> Tempura will generate the best incremental plan that > > > >>>> minimizes the > > > >>>> >>>>>>>> overall > > > >>>> >>>>>>>>> cost function. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> In this incremental plan, the sub-plans at different time > > > >>>> points > > > >>>> >>> can > > > >>>> >>>>>>> be > > > >>>> >>>>>>>>> different from each other, as opposed to identical plans > > in > > > >>>> all > > > >>>> >>> delta > > > >>>> >>>>>>>> runs > > > >>>> >>>>>>>>> as in streaming or IVM. As mentioned in $2.1 of the > > Tempura > > > >>>> paper, > > > >>>> >>> we > > > >>>> >>>>>>> can > > > >>>> >>>>>>>>> mimic the current streaming implementation by specifying > > two > > > >>>> >>> (logical) > > > >>>> >>>>>>>> time > > > >>>> >>>>>>>>> points in Tempura, representing the initial run and later > > > >>>> delta > > > >>>> >>> runs > > > >>>> >>>>>>>>> respectively. In general, note that Tempura supports > > various > > > >>>> form > > > >>>> >>> of > > > >>>> >>>>>>>>> incremental computing, not only the small-delta > > append-only > > > >>>> data > > > >>>> >>>>>>> model in > > > >>>> >>>>>>>>> streaming systems. That's why we believe Tempura subsumes > > > the > > > >>>> >>> current > > > >>>> >>>>>>>>> streaming support, as well as any IVM implementations. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> About the cost model, we did not come up with a seperate > > > cost > > > >>>> >>> model, > > > >>>> >>>>>>> but > > > >>>> >>>>>>>>> rather extended the existing one. Similar to > > multi-objective > > > >>>> >>>>>>>> optimization, > > > >>>> >>>>>>>>> costs incurred at different time points are considered > > > >>>> different > > > >>>> >>>>>>>>> dimensions. Tempura lets users supply a function that > > > >>>> converts this > > > >>>> >>>>>>> cost > > > >>>> >>>>>>>>> vector into a final cost. So under this function, any two > > > >>>> >>> incremental > > > >>>> >>>>>>>> plans > > > >>>> >>>>>>>>> are still comparable and there is an overall optimum. I > > > guess > > > >>>> we > > > >>>> >>> can > > > >>>> >>>>>>> go > > > >>>> >>>>>>>>> down the route of multi-objective parametric query > > > >>>> optimization > > > >>>> >>>>>>> instead > > > >>>> >>>>>>>> if > > > >>>> >>>>>>>>> there is a need. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> Next on materialized views and multi-query optimization, > > > >>>> since our > > > >>>> >>>>>>>>> multi-time-point plan naturally involves materializing > > > >>>> intermediate > > > >>>> >>>>>>>> results > > > >>>> >>>>>>>>> for later time points, we need to solve the problem of > > > >>>> choosing > > > >>>> >>>>>>>>> materializations and include the cost of saving and > > reusing > > > >>>> the > > > >>>> >>>>>>>>> materializations when costing and comparing plans. We > > > >>>> borrowed the > > > >>>> >>>>>>>>> multi-query optimization techniques to solve this problem > > > even > > > >>>> >>> though > > > >>>> >>>>>>> we > > > >>>> >>>>>>>>> are looking at a single query. As a result, we think our > > > work > > > >>>> is > > > >>>> >>>>>>>> orthogonal > > > >>>> >>>>>>>>> to Calcite's facilities around utilizing existing views, > > > >>>> lattice > > > >>>> >>> etc. > > > >>>> >>>>>>> We > > > >>>> >>>>>>>> do > > > >>>> >>>>>>>>> feel that the multi-query optimization component can be > > > >>>> adopted to > > > >>>> >>>>>>> wider > > > >>>> >>>>>>>>> use, but probably need more suggestions from the > > community. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> Lastly, our current implementation is set up in java code, > > > it > > > >>>> >>> should > > > >>>> >>>>>>> be > > > >>>> >>>>>>>>> straightforward to hook it up with SQL shell. > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> Thanks, > > > >>>> >>>>>>>>> Botong > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>> On Mon, Dec 28, 2020 at 6:44 PM Julian Hyde < > > > >>>> >>> jhyde.apa...@gmail.com> > > > >>>> >>>>>>>>> wrote: > > > >>>> >>>>>>>>> > > > >>>> >>>>>>>>>> Botong, > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> This is very exciting; congratulations on this research, > > > and > > > >>>> thank > > > >>>> >>>>>>> you > > > >>>> >>>>>>>>>> for contributing it back to Calcite. > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> The research touches several areas in Calcite: streaming, > > > >>>> >>>>>>> materialized > > > >>>> >>>>>>>>>> view maintenance, and multi-query optimization. As we > > have > > > >>>> already > > > >>>> >>>>>>> some > > > >>>> >>>>>>>>>> solutions in those areas (Sigma and Delta relational > > > >>>> operators, > > > >>>> >>>>>>> lattice, > > > >>>> >>>>>>>>>> and Spool operator), it will be interesting to see > > whether > > > >>>> we can > > > >>>> >>>>>>> make > > > >>>> >>>>>>>> them > > > >>>> >>>>>>>>>> compatible, or whether one concept can subsume others. > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> Your work differs from streaming queries in that your > > > >>>> relations > > > >>>> >>> are > > > >>>> >>>>>>> used > > > >>>> >>>>>>>>>> by “external” user queries, whereas in pure streaming > > > >>>> queries, the > > > >>>> >>>>>>> only > > > >>>> >>>>>>>>>> activity is the change propagation. Did you find that you > > > >>>> needed > > > >>>> >>> two > > > >>>> >>>>>>>>>> separate cost models - one for “view maintenance” and > > > >>>> another for > > > >>>> >>>>>>> “user > > > >>>> >>>>>>>>>> queries” - since the objectives of each activity are so > > > >>>> different? > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> I wonder whether this work will hasten the arrival of > > > >>>> >>> multi-objective > > > >>>> >>>>>>>>>> parametric query optimization [1] in Calcite. > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> I will make time over the next few days to read and > > digest > > > >>>> your > > > >>>> >>>>>>> paper. > > > >>>> >>>>>>>>>> Then I expect that we will have a back-and-forth process > > to > > > >>>> create > > > >>>> >>>>>>>>>> something that will be useful for the broader community. > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> One thing will be particularly useful: making this > > > >>>> functionality > > > >>>> >>>>>>>>>> available from a SQL shell, so that people can experiment > > > >>>> with > > > >>>> >>> this > > > >>>> >>>>>>>>>> functionality without writing Java code or setting up > > > complex > > > >>>> >>>>>>> databases > > > >>>> >>>>>>>> and > > > >>>> >>>>>>>>>> metadata. I have in mind something like the simple DDL > > > >>>> operations > > > >>>> >>>>>>> that > > > >>>> >>>>>>>> are > > > >>>> >>>>>>>>>> available in Calcite’s ’server’ module. I wonder whether > > we > > > >>>> could > > > >>>> >>>>>>> devise > > > >>>> >>>>>>>>>> some kind of SQL syntax for a “multi-query”. > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> Julian > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> [1] > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>> > > > >>>> >>>>>>> > > > >>>> >>> > > > >>>> > > > > > https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/fulltext > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>>> On Dec 23, 2020, at 8:55 PM, Botong Huang < > > > pku...@gmail.com > > > >>>> > > > > >>>> >>>>>>> wrote: > > > >>>> >>>>>>>>>>> > > > >>>> >>>>>>>>>>> Thanks Aron for pointing this out. To see the figure, > > > please > > > >>>> >>> refer > > > >>>> >>>>>>> to > > > >>>> >>>>>>>>>> Fig > > > >>>> >>>>>>>>>>> 3(a) in our paper: > > > >>>> >>>>>>>>>> https://kai-zeng.github.io/papers/tempura-vldb2021.pdf > > > >>>> >>>>>>>>>>> > > > >>>> >>>>>>>>>>> Best, > > > >>>> >>>>>>>>>>> Botong > > > >>>> >>>>>>>>>>> > > > >>>> >>>>>>>>>>> On Wed, Dec 23, 2020 at 7:20 PM JiaTao Tao < > > > >>>> taojia...@gmail.com> > > > >>>> >>>>>>>> wrote: > > > >>>> >>>>>>>>>>> > > > >>>> >>>>>>>>>>>> Seems interesting, the pic can not be seen in the mail, > > > >>>> may you > > > >>>> >>>>>>> open > > > >>>> >>>>>>>> a > > > >>>> >>>>>>>>>> JIRA > > > >>>> >>>>>>>>>>>> for this, people who are interested in this can > > subscribe > > > >>>> to the > > > >>>> >>>>>>>> JIRA? > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> Regards! > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> Aron Tao > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> Botong Huang <bot...@apache.org> 于2020年12月24日周四 > > > 上午3:18写道: > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Hi all, > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> This is a proposal to extend the Calcite optimizer > > into > > > a > > > >>>> >>> general > > > >>>> >>>>>>>>>>>>> incremental query optimizer, based on our research > > paper > > > >>>> >>>>>>> published > > > >>>> >>>>>>>> in > > > >>>> >>>>>>>>>>>> VLDB > > > >>>> >>>>>>>>>>>>> 2021: > > > >>>> >>>>>>>>>>>>> Tempura: a general cost-based optimizer framework for > > > >>>> >>> incremental > > > >>>> >>>>>>>> data > > > >>>> >>>>>>>>>>>>> processing > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> We also have a demo in SIGMOD 2020 illustrating how > > > >>>> Alibaba’s > > > >>>> >>>>>>> data > > > >>>> >>>>>>>>>>>>> warehouse is planning to use this incremental query > > > >>>> optimizer > > > >>>> >>> to > > > >>>> >>>>>>>>>>>> alleviate > > > >>>> >>>>>>>>>>>>> cluster-wise resource skewness: > > > >>>> >>>>>>>>>>>>> Grosbeak: A Data Warehouse Supporting Resource-Aware > > > >>>> >>> Incremental > > > >>>> >>>>>>>>>>>> Computing > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> To our best knowledge, this is the first general > > > >>>> cost-based > > > >>>> >>>>>>>>>> incremental > > > >>>> >>>>>>>>>>>>> optimizer that can find the best plan across multiple > > > >>>> families > > > >>>> >>> of > > > >>>> >>>>>>>>>>>>> incremental computing methods, including IVM, > > Streaming, > > > >>>> >>>>>>> DBToaster, > > > >>>> >>>>>>>>>> etc. > > > >>>> >>>>>>>>>>>>> Experiments (in the paper) shows that the generated > > best > > > >>>> plan > > > >>>> >>> is > > > >>>> >>>>>>>>>>>>> consistently much better than the plans from each > > > >>>> individual > > > >>>> >>>>>>> method > > > >>>> >>>>>>>>>>>> alone. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> In general, incremental query planning is central to > > > >>>> database > > > >>>> >>>>>>> view > > > >>>> >>>>>>>>>>>>> maintenance and stream processing systems, and are > > being > > > >>>> >>> adopted > > > >>>> >>>>>>> in > > > >>>> >>>>>>>>>>>> active > > > >>>> >>>>>>>>>>>>> databases, resumable query execution, approximate > > query > > > >>>> >>>>>>> processing, > > > >>>> >>>>>>>>>> etc. > > > >>>> >>>>>>>>>>>> We > > > >>>> >>>>>>>>>>>>> are hoping that this feature can help widening the > > > >>>> spectrum of > > > >>>> >>>>>>>>>> Calcite, > > > >>>> >>>>>>>>>>>>> solicit more use cases and adoption of Calcite. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Below is a brief description of the technical details. > > > >>>> Please > > > >>>> >>>>>>> refer > > > >>>> >>>>>>>> to > > > >>>> >>>>>>>>>>>> the > > > >>>> >>>>>>>>>>>>> Tempura paper for more details. We are also working > > on a > > > >>>> >>> journal > > > >>>> >>>>>>>>>> version > > > >>>> >>>>>>>>>>>> of > > > >>>> >>>>>>>>>>>>> the paper with more implementation details. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Currently the query plan generated by Calcite is meant > > > to > > > >>>> be > > > >>>> >>>>>>>> executed > > > >>>> >>>>>>>>>>>>> altogether at once. In the proposal, Calcite’s memo > > will > > > >>>> be > > > >>>> >>>>>>> extended > > > >>>> >>>>>>>>>> with > > > >>>> >>>>>>>>>>>>> temporal information so that it is capable of > > generating > > > >>>> >>>>>>> incremental > > > >>>> >>>>>>>>>>>> plans > > > >>>> >>>>>>>>>>>>> that include multiple sub-plans to execute at > > different > > > >>>> time > > > >>>> >>>>>>> points. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> The main idea is to view each table as one that > > changes > > > >>>> over > > > >>>> >>> time > > > >>>> >>>>>>>>>> (Time > > > >>>> >>>>>>>>>>>>> Varying Relations (TVR)). To achieve that we > > introduced > > > >>>> >>>>>>> TvrMetaSet > > > >>>> >>>>>>>>>> into > > > >>>> >>>>>>>>>>>>> Calcite’s memo besides RelSet and RelSubset to track > > > >>>> related > > > >>>> >>>>>>> RelSets > > > >>>> >>>>>>>>>> of a > > > >>>> >>>>>>>>>>>>> changing table (e.g. snapshot of the table at certain > > > >>>> time, > > > >>>> >>>>>>> delta of > > > >>>> >>>>>>>>>> the > > > >>>> >>>>>>>>>>>>> table between two time points, etc.). > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> [image: image.png] > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> For example in the above figure, each vertical line > > is a > > > >>>> >>>>>>> TvrMetaSet > > > >>>> >>>>>>>>>>>>> representing a TVR (S, R, S left outer join R, etc.). > > > >>>> >>> Horizontal > > > >>>> >>>>>>>> lines > > > >>>> >>>>>>>>>>>>> represent time. Each black dot in the grid is a > > RelSet. > > > >>>> Users > > > >>>> >>> can > > > >>>> >>>>>>>>>> write > > > >>>> >>>>>>>>>>>> TVR > > > >>>> >>>>>>>>>>>>> Rewrite Rules to describe valid transformations > > between > > > >>>> these > > > >>>> >>>>>>> dots. > > > >>>> >>>>>>>>>> For > > > >>>> >>>>>>>>>>>>> example, the blues lines are inter-TVR rules that > > > >>>> describe how > > > >>>> >>> to > > > >>>> >>>>>>>>>> compute > > > >>>> >>>>>>>>>>>>> certain RelSet of a TVR from RelSets of other TVRs. > > The > > > >>>> red > > > >>>> >>> lines > > > >>>> >>>>>>>> are > > > >>>> >>>>>>>>>>>>> intra-TVR rules that describe transformations within a > > > >>>> TVR. All > > > >>>> >>>>>>> TVR > > > >>>> >>>>>>>>>>>> rewrite > > > >>>> >>>>>>>>>>>>> rules are logical rules. All existing Calcite rules > > > still > > > >>>> work > > > >>>> >>> in > > > >>>> >>>>>>>> the > > > >>>> >>>>>>>>>> new > > > >>>> >>>>>>>>>>>>> volcano system without modification. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> All changes in this feature will consist of four > > parts: > > > >>>> >>>>>>>>>>>>> 1. Memo extension with TvrMetaSet > > > >>>> >>>>>>>>>>>>> 2. Rule engine upgrade, capable of matching TvrMetaSet > > > and > > > >>>> >>>>>>> RelNodes, > > > >>>> >>>>>>>>>> as > > > >>>> >>>>>>>>>>>>> well as links in between the nodes. > > > >>>> >>>>>>>>>>>>> 3. A basic set of TvrRules, written using the upgraded > > > >>>> rule > > > >>>> >>>>>>> engine > > > >>>> >>>>>>>>>> API. > > > >>>> >>>>>>>>>>>>> 4. Multi-query optimization, used to find the best > > > >>>> incremental > > > >>>> >>>>>>> plan > > > >>>> >>>>>>>>>>>>> involving multiple time points. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Note that this feature is an extension in nature and > > > thus > > > >>>> when > > > >>>> >>>>>>>>>> disabled, > > > >>>> >>>>>>>>>>>>> does not change any existing Calcite behavior. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Other than scenarios in the paper, we also applied > > this > > > >>>> >>>>>>>>>> Calcite-extended > > > >>>> >>>>>>>>>>>>> incremental query optimizer to a type of periodic > > query > > > >>>> called > > > >>>> >>>>>>> the > > > >>>> >>>>>>>>>>>> ‘‘range > > > >>>> >>>>>>>>>>>>> query’’ in Alibaba’s data warehouse. It achieved cost > > > >>>> savings > > > >>>> >>> of > > > >>>> >>>>>>> 80% > > > >>>> >>>>>>>>>> on > > > >>>> >>>>>>>>>>>>> total CPU and memory consumption, and 60% on > > end-to-end > > > >>>> >>> execution > > > >>>> >>>>>>>>>> time. > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> All comments and suggestions are welcome. Thanks and > > > happy > > > >>>> >>>>>>> holidays! > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>>> Best, > > > >>>> >>>>>>>>>>>>> Botong > > > >>>> >>>>>>>>>>>>> > > > >>>> >>>>>>>>>>>> > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>>>> > > > >>>> >>>>>>>> > > > >>>> >>>>>>> > > > >>>> >>>>>>> > > > >>>> >>>>>>> -- > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~ > > > >>>> >>>>>>> no mistakes > > > >>>> >>>>>>> ~~~~~~~~~~~~~~~~~~ > > > >>>> >>>>>>> > > > >>>> >>>>>> > > > >>>> >>> > > > >>>> >> > > > >>>> > > > >>>> > > > > > > > > > -- > > Viliam Durina > > Jet Developer > > hazelcast® > > > > <https://www.hazelcast.com> 2 W 5th Ave, Ste 300 | San Mateo, CA 94402 | > > USA > > +1 (650) 521-5453 | hazelcast.com <https://www.hazelcast.com> > > > > -- > > This message contains confidential information and is intended only for > > the > > individuals named. If you are not the named addressee you should not > > disseminate, distribute or copy this e-mail. Please notify the sender > > immediately by e-mail if you have received this e-mail by mistake and > > delete this e-mail from your system. E-mail transmission cannot be > > guaranteed to be secure or error-free as information could be intercepted, > > corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. > > The sender therefore does not accept liability for any errors or omissions > > in the contents of this message, which arise as a result of e-mail > > transmission. If verification is required, please request a hard-copy > > version. -Hazelcast > >