Re: [DISCUSS] Towards Cascades Optimizer

2020-05-14 Thread Julian Hyde
> I even added a > VolcanoPlannerFactory in my code to allow user to specify their own > planner. But I also see that Julian is more of keeping one planner. I am fine with multiple planners. (We already have multiple planners - Hep and Volcano.) Another one or two more is fine. In the short term

Re: [DISCUSS] Towards Cascades Optimizer

2020-05-12 Thread Jinpeng Wu
Hi, Roman. Your suggestion is good and that's what I thought before. I even added a VolcanoPlannerFactory in my code to allow user to specify their own planner. But I also see that Julian is more of keeping one planner. This has been a long discussion. I think it can hardly goes to a consensus

Re: [DISCUSS] Towards Cascades Optimizer

2020-05-11 Thread Julian Hyde
If there’s a way for all of these efforts to be committed to master, disabled by default, but such that they can be enabled by setting a flag, then we can at least avoid merge conflicts. I hope that this “trunk-based development” approach is technically feasible. If the rule and metadata APIs

Re: [DISCUSS] Towards Cascades Optimizer

2020-05-11 Thread Roman Kondakov
Hi Jinpeng, Apache Calcite community has entered into the interesting situation: we have several concurrent efforts of building the new Cascades style optimizers. I can see at least three activities (correct me if I'm wrong): 1. Haisheng's gradual rebuilding of VolcanoPlanner [1] 2. Jinpeng's

Re: [DISCUSS] Towards Cascades Optimizer

2020-05-11 Thread Jinpeng Wu
Hi, Roman. Thanks to Julian's tips, I added a CoreCascadeQuidemTest to the code. It runs iq files that CoreQuidemTest would runs and uses CascadePlanner instead of VolcanoPlanner to generate physical plans. Currently all tests have passed. There are some plan changes but they are actually

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-30 Thread Haisheng Yuan
Hi all, As planned in my proposal, I opened the pull request [1] for CALCITE-3896 to achieve: 1. Top-down trait request 2. Bottom-up trait derivation 3. Trait enforcement without AbstractConverter The feature can be turned on or off by a flag, either through property config file or

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-30 Thread Julian Hyde
If your test cases are SQL scripts, it might be fairly straightforward to port them to Quidem (.iq) files. Plenty of examples in https://github.com/apache/calcite/tree/master/core/src/test/resources/sql . Quidem files

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-30 Thread Jinpeng Wu
Sure. I will add more cases to my PR. I did not design more cases because our own product has a test frameworks, which contains thousands of actual user queries. Calcite's code base is quite different. I cannot just migrate cases to calcite. So it may take some time. On Wed, Apr 29, 2020 at

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-28 Thread Roman Kondakov
Hi Xiening, thank you for your feedback. > 1. Do we really need RelGroup and RelSubGroup? I believe the memo structure > would be largely the same even if we move towards a Cascade planner. I think > we can reuse most of the RelSet/RelSubset today. RelSubset is a tricky one. > There’s no

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-28 Thread Roman Kondakov
Hi Jinpeng, I went through your PR and it seemed very impressive to me. It is very similar to what I did, but you've reused many existing logic from the Volcano planner. We should definitely stay in sync in our experiments. I believe the future Cascades planner will be the kind combination of our

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-28 Thread Jinpeng Wu
Hi, Roman. It's great to see your proposal. Actually my team has also been working on a cascade planner based on calcite. And we already have some outcome as well. Maybe we can combine our works. I've pushed my code as https://github.com/apache/calcite/pull/1950 . Our works have many places in

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
For #1, aside from that we need to be able to build physical nodes based on a convention. For example, if we merge two EnumerableProject, we would want to create an EnumerableProject as a result, instead of LogicalProject. The RelBuilder change I work on would help this case. For #2, I don’t

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
Hi Roman, First, thank you for sharing your design and prototype code. I took a quick look at your design and have some high level feedback - 1. Do we really need RelGroup and RelSubGroup? I believe the memo structure would be largely the same even if we move towards a Cascade planner. I think

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
PS Someone mentioned that logical properties do not propagate across RelSubsets in the same RelSet. That is a bug, and we should fix it. For example, if subset#1 has determined that MinRowCount=1 then subset#2 in the same set should also inherit that MinRowCount. The same goes for other logical

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
Re 1. By all means have multiple instances of a rule (e.g. one instance that matches LogicalFilter and another that matches FooFilter) and enable different instances during different phases. (We have been slow to create all of these variant instances, in part because of the difficulty of

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Xiening Dai
Hi Julian, In my view, separating logic and physical rules have a number of benefits - 1. With current design, a rule can match both physical and logical nodes. This behavior could cause duplication of rule firings and explosion of memo and search space. There was a long discussion regarding

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Julian Hyde
This thread has almost gotten too long to respond to. I confess I’ve not read much of it. I’m going to reply anyway. Sorry. I support making Calcite’s optimizer support “Cascades”. We should keep the existing VolcanoPlanner working during the transition, and perhaps longer. (I acknowledge that

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Roman Kondakov
Hi all, Stamatis, Haisheng thank you very much for your feedback! I really appreciate it. > If in the new planner we end up copy-pasting code then I guess it will be a > bad idea. Yes, there are some code duplication between Volcano planner and Cascades planner. I think I'll move it to some

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Stamatis Zampetakis
Hi all, I am very excited about the ideas discussed so far and especially by the enthusiasm of many people that are ready to help for pulling this out. I wouldn't except that we could have a prototype so quickly. Thanks a lot everyone! In the debate between creating new planner or patching the

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Haisheng Yuan
Hi Roman, Excellent! This is definitely a helpful contribution to the Calcite community. Thank you for your endeavors. Haisheng On 2020/04/26 19:25:00, Roman Kondakov wrote: > Hi everyone! > > Haisheng, thank you for bringing this subject up. A new Cascades-style > optimizer should be

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Roman Kondakov
Hi everyone! Haisheng, thank you for bringing this subject up. A new Cascades-style optimizer should be definitely the next step for Apache Calcite. Many projects suffer from the lack of this kind of optimizer. That was the reason why several weeks ago I started working on the prototype of

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-22 Thread Danny Chan
> Is there any recommended approach to make that happen smoothly besides coding and testing work? We need to be aware that the new planner might be co-exist with VolcanoPlanner for 5 or more years, or even never replace VolcanoPlanner. If that is true, i might say the new planner is probably with

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Haisheng Yuan
Hi Andrii, > Obviously, from what is written here, I could guess that this would require > me to change my physical planning rules, even if only by implementing a > marker interface. You don't need to change your physical rules, it will be treated as equal as logical rules and be applied

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Андрей Цвелодуб
Hello Haisheng, > To keep backward compatibility, all the un-marked rules will be treated as logical rules, except rules that uses AbstractConverter as rule operand, these rules still need to applied top-down, or random order. Obviously, from what is written here, I could guess that this would

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Haisheng Yuan
Hi Andrii, > I guess changing the planner would lead to changes in tons of rules and even > more tests. Obviously you didn't read through my email. You are not required to do any changes to your rule if you don't want to, but if you do, just need to mark the rule to tell planner whether it is

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Chunwei Lei
Haisheng and Xiening, thanks for sharing these wonderful ideas. I believe this will be a huge improvement and definitely benefits all users. >From my experience of upgrading calcite version, there are always some changes in the new version which may lead to unexpected behavior due to a lack of

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Андрей Цвелодуб
Hello everyone, First of all, thanks for this great effort of improving the core parts of the framework we all are using, I believe this is long overdue and hope this will have benefits both for the maintainers and users of the library. I don't have anything to say about the general idea at the

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Jinpeng Wu
Hi, Xiening. Regarding calculating the logical cost, here are some ways I though: 1. Logical rel may implement their own computeSelfCost method. Some rels can provide such information, for example the LogicalProject/LogicalFilter contains nearly the same information as their physical

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Haisheng Yuan
Hi Hanumath, The trait in the example is for distribution only for brevity, not including collation. No matter it is hash join or merge join or nestedloop join, the same distribution applied. > Are you planning to use the same interface as that of VolcanoPlanner? Yes, not only for

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread hanu mapr
Hello Haisheng, Thanks for the detailed analysis on the support for cascades framework. I am quite interested to be part of the new optimization framework. I believe this a very important infrastructural work to make calcite a robust query optimizer. I like your approach on the trait

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Xiening Dai
Hi Jinpeng, Regarding this comment - I believe there are ways to calculate the logical cost, but I think it’s not that simple as "cardinality * unit_copy_cost.”, would you provide more details of other different ways? Just the algorithm description or pseudo code would help us understand.

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Haisheng Yuan
Igor, That's great. On 2020/04/20 11:17:49, Seliverstov Igor wrote: > Haisheng, Xiening, > > Ok, Now I see how it should work. > > Thanks for your replies. > > Regards, > Igor > > > 20 апр. 2020 г., в 09:56, Seliverstov Igor > > написал(а): > > > > Haisheng, Xiening, > > > > Thanks for

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Seliverstov Igor
Haisheng, Xiening, Ok, Now I see how it should work. Thanks for your replies. Regards, Igor > 20 апр. 2020 г., в 09:56, Seliverstov Igor написал(а): > > Haisheng, Xiening, > > Thanks for clarifying. > > In this proposal, we are not trying to split logical and physical planning >

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread Seliverstov Igor
Haisheng, Xiening, Thanks for clarifying. *In this proposal, we are not trying to split logical and physical planning entirely. *- actually I was in doubt about an idea of entire splitting logical and physical phases, if you aren't going to, I have no objections. But it returns me to my first

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-20 Thread 吴金朋
Hi, Haisheng and Igor. I think we do need the ability for logical space pruning. But we can achieve it step by step. In the first trial, we implement and optimize a rel only after it is fully explored. And then, after solving problems like group sharing stats and logical cost accuracy, we move it

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Xiening Dai
Hi Igor, Your comment - "because actual cost may be calculated correctly using physical operators only. So won't be able to implement Branch and Bound Space Pruning.“ is actually not true. In Cascade’s lower bound / upper bound pruning algorithm, you can get cost lower bound of input RelNode

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Igor, a) Given current Calcite's stats derivation strategy, mixing logical and physical planning won't make it better. I hope you went through my email to the end, currently operators inside a memo group don't share stats info, each operator's stats may differ with the other one, and the

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Haisheng, From my point of view splitting logical and physical planning parts isn’t a good idea. I think so because actual cost may be calculated correctly using physical operators only. So that we: a) won't be able to implement Branch and Bound Space Pruning (as far as I understand, at

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor, You can't have your cake and eat it. But one feasible work item definitely we can do is that once timeout, stop exploring, use the first available physical operator in each group and optimize it. Because most, if not all, of the long / infinite running optimizations are caused by

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Haisheng, Ok, then such notification isn't needed But in this case we don't have any control over how long planning takes For some systems it's necessary to get good enough plan right now instead of best one after while For example we've been considering a case when a query is optimised

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Haisheng Yuan
Hi Igor, There will be no rule queue anymore. Y will be fully explored (logical rules are matched and applied) before it can be implemented and optimized. Thanks, Haisheng On 2020/04/19 10:11:45, Seliverstov Igor wrote: > Hi Haisheng, > > Great explanation, thanks. > > One thing I'd like

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-19 Thread Seliverstov Igor
Hi Haisheng, Great explanation, thanks. One thing I'd like to cover in advance is trait propagation process (I need it for Apache Ignite SQL engine implementation). For example: There are two nodes: Rel X and its child node Rel Y Both nodes are in Optimized state, and there is a Logical rule

[DISCUSS] Towards Cascades Optimizer

2020-04-18 Thread Haisheng Yuan
Hi, In the past few months, we have discussed a lot about Cascades style top-down optimization, including on-demand trait derivation/request, rule apply, branch and bound space pruning. Now we think it is time to move towards these targets. We will separate it into several small issues, and