Re: [ANNOUNCE] New committer: Vineet Garg
Congrats, Vineet! Best, Danny Chan 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道: > > Congrats, Vineet!
Re: [ANNOUNCE] New committer: Vineet Garg
Congrats, Vineet! Danny Chan 于2020年4月26日周日 下午4:52写道: > Congrats, Vineet! > > Best, > Danny Chan > 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道: > > > > Congrats, Vineet! > -- Best regards, Xu
Re: [ANNOUNCE] New committer: Vineet Garg
Congratulations, Vineet! Best, Leonard Xu > 在 2020年4月26日,18:07,xu 写道: > > Congrats, Vineet! > > Danny Chan 于2020年4月26日周日 下午4:52写道: > >> Congrats, Vineet! >> >> Best, >> Danny Chan >> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道: >>> >>> Congrats, Vineet! >> > > > -- > > Best regards, > > Xu
Re: [DISCUSS] Towards Cascades Optimizer
Hi everyone! Haisheng, thank you for bringing this subject up. A new Cascades-style optimizer should be definitely the next step for Apache Calcite. Many projects suffer from the lack of this kind of optimizer. That was the reason why several weeks ago I started working on the prototype of Cascades optimizer for Apache Calcite. I was not sure that I would build something useful without much experience in this area. But now I'd like to share my work with the community. You can find the Cascades prototype in PR [1]. This prototype is based on the well-known paper [2]. What prototype can do: - Top-down trait request - Convert traits without Abstract converters - Top-down rule apply - Bottom-up trait derivation What is not supported yet: - Search space pruning (but I'm going to fix it in the next commits) - Materialized views - Planner hooks - Hints - Backward compatibility with Volcano planner (some research needed) I prepared a design doc for this planner [3], you can find many details there. I also opened it for comments. I've written several basic test cases in org.apache.calcite.plan.cascades.CascadesPlannerTest including that was discussed in this thread previously in the context of trait requests: > MergeJoin hash[a] > | TableScan R hash[a] (RelSubset) > + TableScan S hash[a] (RelSubset) Haisheng, this is very similar to what you propose. Answering your question: > There are 2 ways: > a) Modify on current VolcanoPlanner. > Pros: code reuse, existing numerous test cases and infrastructure, fast > integration > Cons: changing code always brings risk > > b) Add a new Planner > Pros: no risk, no tech debt, no need to worry about backward compatability > Cons: separate test cases for new planner, one more planner to maintain I've chosen the second approach. Because I don't have clear understanding how to fix Volcano planner gradually. This new planner is very far from perfect, but I think it can be a good starting point for community. Please, share your thoughts about this planner. [1] PR: https://github.com/apache/calcite/pull/1948 [2] Paper: https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf [3] Design doc: https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing -- Kind Regards Roman Kondakov On 22.04.2020 09:52, Danny Chan wrote: >> Is there any recommended approach to make that happen smoothly besides > coding and testing work? We need to be aware that the new planner might be > co-exist with VolcanoPlanner for 5 or more years, or even never replace > VolcanoPlanner. > > If that is true, i might say the new planner is probably with a not that > good design, we expect to see in advance for what cases/reasons user has > the reason to keep the old VolcanoPlanner and we *must* give a solution for > those problems in the new design. > > I was expecting that migrating to a new planner would at least take 1 year > for developing, if that is true, modifying directly based on current > planner means for the near future 3~4 versions Calcite, there would bring > in huge plan changes/bugs for each release which i believe all the users of > Calcite don't want to see. And on one can guarantee that modifying directly > can keep good stability and compatibility, only the test set do. > > From the experience of Alibaba Blink planner which has contributed to > Apache Flink, yes, the old/new planner would co-exist at least for 2 years. > For the reasons that the new and old planner has different ability in some > corner cases. > > From my point of view, we should at least: > - Give a convincing test set for the new planner that makes us believe the > new planner is stable and powerful enough. I mean obviously the current > rule tests are far away from enough to support the new planner > - We should give a more detailed design doc about the new planner, > especially about the interfaces changes and any change that would bring in > the compatibility problem. Then we can make more accurate decision how much > work the new planner would bring in, until then, we can decide if switch to > a pure new planner development is a good idea or modify the existing one. > > > Haisheng Yuan 于2020年4月22日周三 上午9:45写道: > >> Hi Andrii, >> >>> Obviously, from what is written here, I could guess that this would >> require me to change my physical planning rules, even if only by >> implementing a marker interface. >> You don't need to change your physical rules, it will be treated as equal >> as logical rules and be applied together with the real logical rules, no >> more logical/physical rules difference. This is also how current >> VolcanoPlanner works. >> >>> I don't want you to think that I somehow resent the changes you are >> pushing. >> Don't get me wrong. I am seriously thinking of revert these changes, since >> most people like the idea of adding new planner, why don't we make all the >
Re: [DISCUSS] Towards Cascades Optimizer
Hi Roman, Excellent! This is definitely a helpful contribution to the Calcite community. Thank you for your endeavors. Haisheng On 2020/04/26 19:25:00, Roman Kondakov wrote: > Hi everyone! > > Haisheng, thank you for bringing this subject up. A new Cascades-style > optimizer should be definitely the next step for Apache Calcite. Many > projects suffer from the lack of this kind of optimizer. > > That was the reason why several weeks ago I started working on the > prototype of Cascades optimizer for Apache Calcite. I was not sure that > I would build something useful without much experience in this area. But > now I'd like to share my work with the community. You can find the > Cascades prototype in PR [1]. This prototype is based on the well-known > paper [2]. > > What prototype can do: > - Top-down trait request > - Convert traits without Abstract converters > - Top-down rule apply > - Bottom-up trait derivation > > What is not supported yet: > - Search space pruning (but I'm going to fix it in the next commits) > - Materialized views > - Planner hooks > - Hints > - Backward compatibility with Volcano planner (some research needed) > > I prepared a design doc for this planner [3], you can find many details > there. I also opened it for comments. > > I've written several basic test cases in > org.apache.calcite.plan.cascades.CascadesPlannerTest including that was > discussed in this thread previously in the context of trait requests: > > > MergeJoin hash[a] > > | TableScan R hash[a] (RelSubset) > > + TableScan S hash[a] (RelSubset) > > > Haisheng, this is very similar to what you propose. Answering your question: > > > There are 2 ways: > > a) Modify on current VolcanoPlanner. > > Pros: code reuse, existing numerous test cases and infrastructure, fast > > integration > > Cons: changing code always brings risk > > > > b) Add a new Planner > > Pros: no risk, no tech debt, no need to worry about backward compatability > > Cons: separate test cases for new planner, one more planner to maintain > > I've chosen the second approach. Because I don't have clear > understanding how to fix Volcano planner gradually. > > This new planner is very far from perfect, but I think it can be a good > starting point for community. > > Please, share your thoughts about this planner. > > > [1] PR: https://github.com/apache/calcite/pull/1948 > [2] Paper: > https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf > [3] Design doc: > https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing > > > > -- > Kind Regards > Roman Kondakov > > > On 22.04.2020 09:52, Danny Chan wrote: > >> Is there any recommended approach to make that happen smoothly besides > > coding and testing work? We need to be aware that the new planner might be > > co-exist with VolcanoPlanner for 5 or more years, or even never replace > > VolcanoPlanner. > > > > If that is true, i might say the new planner is probably with a not that > > good design, we expect to see in advance for what cases/reasons user has > > the reason to keep the old VolcanoPlanner and we *must* give a solution for > > those problems in the new design. > > > > I was expecting that migrating to a new planner would at least take 1 year > > for developing, if that is true, modifying directly based on current > > planner means for the near future 3~4 versions Calcite, there would bring > > in huge plan changes/bugs for each release which i believe all the users of > > Calcite don't want to see. And on one can guarantee that modifying directly > > can keep good stability and compatibility, only the test set do. > > > > From the experience of Alibaba Blink planner which has contributed to > > Apache Flink, yes, the old/new planner would co-exist at least for 2 years. > > For the reasons that the new and old planner has different ability in some > > corner cases. > > > > From my point of view, we should at least: > > - Give a convincing test set for the new planner that makes us believe the > > new planner is stable and powerful enough. I mean obviously the current > > rule tests are far away from enough to support the new planner > > - We should give a more detailed design doc about the new planner, > > especially about the interfaces changes and any change that would bring in > > the compatibility problem. Then we can make more accurate decision how much > > work the new planner would bring in, until then, we can decide if switch to > > a pure new planner development is a good idea or modify the existing one. > > > > > > Haisheng Yuan 于2020年4月22日周三 上午9:45写道: > > > >> Hi Andrii, > >> > >>> Obviously, from what is written here, I could guess that this would > >> require me to change my physical planning rules, even if only by > >> implementing a marker interface. > >> You don't need to change your physical rules, it will be treated as equal > >> a
Re: [ANNOUNCE] New committer: Vineet Garg
Congrats Vineet, well deserved! -Jesús On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu wrote: > Congratulations, Vineet! > > Best, > Leonard Xu > > 在 2020年4月26日,18:07,xu 写道: > > > > Congrats, Vineet! > > > > Danny Chan 于2020年4月26日周日 下午4:52写道: > > > >> Congrats, Vineet! > >> > >> Best, > >> Danny Chan > >> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道: > >>> > >>> Congrats, Vineet! > >> > > > > > > -- > > > > Best regards, > > > > Xu > >
Re: [ANNOUNCE] New committer: Vineet Garg
Thanks a lot guys! Just to briefly introduce myself - I work with Cloudera (Hortonworks before) on Hive and I am a Hive PMC member. As Stamatis noted I have been involved in calcite since 2017. It is great honor to be part of this community. I am very excited to become committer and I look forward to contributing more. Regards, Vineet Garg > On Apr 26, 2020, at 2:26 PM, Jesus Camacho Rodriguez > wrote: > > Congrats Vineet, well deserved! > > -Jesús > > On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu wrote: > >> Congratulations, Vineet! >> >> Best, >> Leonard Xu >>> 在 2020年4月26日,18:07,xu 写道: >>> >>> Congrats, Vineet! >>> >>> Danny Chan 于2020年4月26日周日 下午4:52写道: >>> Congrats, Vineet! Best, Danny Chan 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道: > > Congrats, Vineet! >>> >>> >>> -- >>> >>> Best regards, >>> >>> Xu >> >>
Re: [DISCUSS] Towards Cascades Optimizer
Hi all, I am very excited about the ideas discussed so far and especially by the enthusiasm of many people that are ready to help for pulling this out. I wouldn't except that we could have a prototype so quickly. Thanks a lot everyone! In the debate between creating new planner or patching the existing one, I don't have a clear preference. I think the answer depends on how many things can we reuse. If in the new planner we end up copy-pasting code then I guess it will be a bad idea. On the other hand, if the new and old planner do not have many things in common then I guess the answer is obvious. >From Haisheng's description, I was thinking that many of the proposed changes could go in the existing planner. Like that it will be easier for everybody to test and see if the changes make this better or worse. >From a backward compatibility perspective it seems feasible to keep the new features configurable for a certain amount of time. >From the wish-list, I think we should focus initially on points: 1. Top-down trait request 2. Convert traits without Abstract converters 4. Bottom-up trait derivation I know that 3, and 5, are also important but I have the feeling they can wait a bit longer. A design doc would definitely help, especially if it has a few end-to-end (from logical to physical plan) examples showing how the optimizer works at each step before/after the changes. This is actually what is usually missing in research papers that makes them hard to understand. I am thinking some similar to the examples that Haisheng send in the first email but possibly a bit more detailed. I looked very briefly in the PR by Roman but I think I didn't see tests where the final plan contains operators from multiple conventions. Multiple conventions is among the choices that complicate certain parts of the existing planner so we should make sure that we take this into account. Hoping to find some time to think over all this more quietly. Very interesting stuff :) Best, Stamatis On Sun, Apr 26, 2020 at 11:14 PM Haisheng Yuan wrote: > Hi Roman, > > Excellent! This is definitely a helpful contribution to the Calcite > community. > Thank you for your endeavors. > > Haisheng > > On 2020/04/26 19:25:00, Roman Kondakov > wrote: > > Hi everyone! > > > > Haisheng, thank you for bringing this subject up. A new Cascades-style > > optimizer should be definitely the next step for Apache Calcite. Many > > projects suffer from the lack of this kind of optimizer. > > > > That was the reason why several weeks ago I started working on the > > prototype of Cascades optimizer for Apache Calcite. I was not sure that > > I would build something useful without much experience in this area. But > > now I'd like to share my work with the community. You can find the > > Cascades prototype in PR [1]. This prototype is based on the well-known > > paper [2]. > > > > What prototype can do: > > - Top-down trait request > > - Convert traits without Abstract converters > > - Top-down rule apply > > - Bottom-up trait derivation > > > > What is not supported yet: > > - Search space pruning (but I'm going to fix it in the next commits) > > - Materialized views > > - Planner hooks > > - Hints > > - Backward compatibility with Volcano planner (some research needed) > > > > I prepared a design doc for this planner [3], you can find many details > > there. I also opened it for comments. > > > > I've written several basic test cases in > > org.apache.calcite.plan.cascades.CascadesPlannerTest including that was > > discussed in this thread previously in the context of trait requests: > > > > > MergeJoin hash[a] > > > | TableScan R hash[a] (RelSubset) > > > + TableScan S hash[a] (RelSubset) > > > > > > Haisheng, this is very similar to what you propose. Answering your > question: > > > > > There are 2 ways: > > > a) Modify on current VolcanoPlanner. > > > Pros: code reuse, existing numerous test cases and infrastructure, > fast integration > > > Cons: changing code always brings risk > > > > > > b) Add a new Planner > > > Pros: no risk, no tech debt, no need to worry about backward > compatability > > > Cons: separate test cases for new planner, one more planner to > maintain > > > > I've chosen the second approach. Because I don't have clear > > understanding how to fix Volcano planner gradually. > > > > This new planner is very far from perfect, but I think it can be a good > > starting point for community. > > > > Please, share your thoughts about this planner. > > > > > > [1] PR: https://github.com/apache/calcite/pull/1948 > > [2] Paper: > > > https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf > > [3] Design doc: > > > https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing > > > > > > > > -- > > Kind Regards > > Roman Kondakov > > > > > > On 22.04.2020 09:52, Danny Chan wrote: > > >> Is there any recommended approach to make t
[jira] [Created] (CALCITE-3961) VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded
Botong Huang created CALCITE-3961: - Summary: VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded Key: CALCITE-3961 URL: https://issues.apache.org/jira/browse/CALCITE-3961 Project: Calcite Issue Type: Bug Reporter: Botong Huang VolcanoPlanner.prunedNodes stores the list of relNodes that are marked useless. Whenever the planner see two identical relNode (e.g. when Relsets are merged), one of them are discarded. However, when the preserved node is not in the pruned list while the discarded one is, this pruned information is lost. In general, we should preserve this info whenever duplicate relNodes are discarded. -- This message was sent by Atlassian Jira (v8.3.4#803005)