Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-26 Thread Danny Chan
Congrats, Vineet!

Best,
Danny Chan
在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
>
> Congrats, Vineet!


Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-26 Thread xu
Congrats, Vineet!

Danny Chan  于2020年4月26日周日 下午4:52写道:

> Congrats, Vineet!
>
> Best,
> Danny Chan
> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
> >
> > Congrats, Vineet!
>


-- 

Best regards,

Xu


Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-26 Thread Leonard Xu
Congratulations, Vineet!

Best,
Leonard Xu
> 在 2020年4月26日,18:07,xu  写道:
> 
> Congrats, Vineet!
> 
> Danny Chan  于2020年4月26日周日 下午4:52写道:
> 
>> Congrats, Vineet!
>> 
>> Best,
>> Danny Chan
>> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
>>> 
>>> Congrats, Vineet!
>> 
> 
> 
> -- 
> 
> Best regards,
> 
> Xu



Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Roman Kondakov
Hi everyone!

Haisheng, thank you for bringing this subject up. A new Cascades-style
optimizer should be definitely the next step for Apache Calcite. Many
projects suffer from the lack of this kind of optimizer.

That was the reason why several weeks ago I started working on the
prototype of Cascades optimizer for Apache Calcite. I was not sure that
I would build something useful without much experience in this area. But
now I'd like to share my work with the community. You can find the
Cascades prototype in PR [1]. This prototype is based on the well-known
paper [2].

What prototype can do:
- Top-down trait request
- Convert traits without Abstract converters
- Top-down rule apply
- Bottom-up trait derivation

What is not supported yet:
- Search space pruning (but I'm going to fix it in the next commits)
- Materialized views
- Planner hooks
- Hints
- Backward compatibility with Volcano planner (some research needed)

I prepared a design doc for this planner [3], you can find many details
there. I also opened it for comments.

I've written several basic test cases in
org.apache.calcite.plan.cascades.CascadesPlannerTest including that was
discussed in this thread previously in the context of trait requests:

>  MergeJoin hash[a] 
>   | TableScan R hash[a] (RelSubset)
>   + TableScan S hash[a] (RelSubset)


Haisheng, this is very similar to what you propose. Answering your question:

> There are 2 ways:
> a) Modify on current VolcanoPlanner.
>   Pros: code reuse, existing numerous test cases and infrastructure, fast 
> integration
>   Cons: changing code always brings risk
> 
> b) Add a new Planner
>   Pros: no risk, no tech debt, no need to worry about backward compatability
>   Cons: separate test cases for new planner, one more planner to maintain

I've chosen the second approach. Because I don't have clear
understanding how to fix Volcano planner gradually.

This new planner is very far from perfect, but I think it can be a good
starting point for community.

Please, share your thoughts about this planner.


[1] PR: https://github.com/apache/calcite/pull/1948
[2] Paper:
https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf
[3] Design doc:
https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing



-- 
Kind Regards
Roman Kondakov


On 22.04.2020 09:52, Danny Chan wrote:
>> Is there any recommended approach to make that happen smoothly besides
> coding and testing work? We need to be aware that the new planner might be
> co-exist with VolcanoPlanner for 5 or more years, or even never replace
> VolcanoPlanner.
> 
> If that is true, i might say the new planner is probably with a not that
> good design, we expect to see in advance for what cases/reasons user has
> the reason to keep the old VolcanoPlanner and we *must* give a solution for
> those problems in the new design.
> 
> I was expecting that migrating to a new planner would at least take 1 year
> for developing, if that is true, modifying directly based on current
> planner means for the near future 3~4 versions Calcite, there would bring
> in huge plan changes/bugs for each release which i believe all the users of
> Calcite don't want to see. And on one can guarantee that modifying directly
> can keep good stability and compatibility, only the test set do.
> 
> From the experience of Alibaba Blink planner which has contributed to
> Apache Flink, yes, the old/new planner would co-exist at least for 2 years.
> For the reasons that the new and old planner has different ability in some
> corner cases.
> 
> From my point of view, we should at least:
> - Give a convincing test set for the new planner that makes us believe the
> new planner is stable and powerful enough. I mean obviously the current
> rule tests are far away from enough to support the new planner
> - We should give a more detailed design doc about the new planner,
> especially about the interfaces changes and any change that would bring in
> the compatibility problem. Then we can make more accurate decision how much
> work the new planner would bring in, until then, we can decide if switch to
> a pure new planner development is a good idea or modify the existing one.
> 
> 
> Haisheng Yuan  于2020年4月22日周三 上午9:45写道:
> 
>> Hi Andrii,
>>
>>> Obviously, from what is written here, I could guess that this would
>> require me to change my physical planning rules, even if only by
>> implementing a marker interface.
>> You don't need to change your physical rules, it will be treated as equal
>> as logical rules and be applied together with the real logical rules, no
>> more logical/physical rules difference. This is also how current
>> VolcanoPlanner works.
>>
>>> I don't want you to think that I somehow resent the changes you are
>> pushing.
>> Don't get me wrong. I am seriously thinking of revert these changes, since
>> most people like the idea of adding new planner, why don't we make all the
>

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Haisheng Yuan
Hi Roman,

Excellent! This is definitely a helpful contribution to the Calcite community.
Thank you for your endeavors.

Haisheng

On 2020/04/26 19:25:00, Roman Kondakov  wrote: 
> Hi everyone!
> 
> Haisheng, thank you for bringing this subject up. A new Cascades-style
> optimizer should be definitely the next step for Apache Calcite. Many
> projects suffer from the lack of this kind of optimizer.
> 
> That was the reason why several weeks ago I started working on the
> prototype of Cascades optimizer for Apache Calcite. I was not sure that
> I would build something useful without much experience in this area. But
> now I'd like to share my work with the community. You can find the
> Cascades prototype in PR [1]. This prototype is based on the well-known
> paper [2].
> 
> What prototype can do:
> - Top-down trait request
> - Convert traits without Abstract converters
> - Top-down rule apply
> - Bottom-up trait derivation
> 
> What is not supported yet:
> - Search space pruning (but I'm going to fix it in the next commits)
> - Materialized views
> - Planner hooks
> - Hints
> - Backward compatibility with Volcano planner (some research needed)
> 
> I prepared a design doc for this planner [3], you can find many details
> there. I also opened it for comments.
> 
> I've written several basic test cases in
> org.apache.calcite.plan.cascades.CascadesPlannerTest including that was
> discussed in this thread previously in the context of trait requests:
> 
> >  MergeJoin hash[a] 
> >   | TableScan R hash[a] (RelSubset)
> >   + TableScan S hash[a] (RelSubset)
> 
> 
> Haisheng, this is very similar to what you propose. Answering your question:
> 
> > There are 2 ways:
> > a) Modify on current VolcanoPlanner.
> >   Pros: code reuse, existing numerous test cases and infrastructure, fast 
> > integration
> >   Cons: changing code always brings risk
> > 
> > b) Add a new Planner
> >   Pros: no risk, no tech debt, no need to worry about backward compatability
> >   Cons: separate test cases for new planner, one more planner to maintain
> 
> I've chosen the second approach. Because I don't have clear
> understanding how to fix Volcano planner gradually.
> 
> This new planner is very far from perfect, but I think it can be a good
> starting point for community.
> 
> Please, share your thoughts about this planner.
> 
> 
> [1] PR: https://github.com/apache/calcite/pull/1948
> [2] Paper:
> https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf
> [3] Design doc:
> https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing
> 
> 
> 
> -- 
> Kind Regards
> Roman Kondakov
> 
> 
> On 22.04.2020 09:52, Danny Chan wrote:
> >> Is there any recommended approach to make that happen smoothly besides
> > coding and testing work? We need to be aware that the new planner might be
> > co-exist with VolcanoPlanner for 5 or more years, or even never replace
> > VolcanoPlanner.
> > 
> > If that is true, i might say the new planner is probably with a not that
> > good design, we expect to see in advance for what cases/reasons user has
> > the reason to keep the old VolcanoPlanner and we *must* give a solution for
> > those problems in the new design.
> > 
> > I was expecting that migrating to a new planner would at least take 1 year
> > for developing, if that is true, modifying directly based on current
> > planner means for the near future 3~4 versions Calcite, there would bring
> > in huge plan changes/bugs for each release which i believe all the users of
> > Calcite don't want to see. And on one can guarantee that modifying directly
> > can keep good stability and compatibility, only the test set do.
> > 
> > From the experience of Alibaba Blink planner which has contributed to
> > Apache Flink, yes, the old/new planner would co-exist at least for 2 years.
> > For the reasons that the new and old planner has different ability in some
> > corner cases.
> > 
> > From my point of view, we should at least:
> > - Give a convincing test set for the new planner that makes us believe the
> > new planner is stable and powerful enough. I mean obviously the current
> > rule tests are far away from enough to support the new planner
> > - We should give a more detailed design doc about the new planner,
> > especially about the interfaces changes and any change that would bring in
> > the compatibility problem. Then we can make more accurate decision how much
> > work the new planner would bring in, until then, we can decide if switch to
> > a pure new planner development is a good idea or modify the existing one.
> > 
> > 
> > Haisheng Yuan  于2020年4月22日周三 上午9:45写道:
> > 
> >> Hi Andrii,
> >>
> >>> Obviously, from what is written here, I could guess that this would
> >> require me to change my physical planning rules, even if only by
> >> implementing a marker interface.
> >> You don't need to change your physical rules, it will be treated as equal
> >> a

Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-26 Thread Jesus Camacho Rodriguez
Congrats Vineet, well deserved!

-Jesús

On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu  wrote:

> Congratulations, Vineet!
>
> Best,
> Leonard Xu
> > 在 2020年4月26日,18:07,xu  写道:
> >
> > Congrats, Vineet!
> >
> > Danny Chan  于2020年4月26日周日 下午4:52写道:
> >
> >> Congrats, Vineet!
> >>
> >> Best,
> >> Danny Chan
> >> 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
> >>>
> >>> Congrats, Vineet!
> >>
> >
> >
> > --
> >
> > Best regards,
> >
> > Xu
>
>


Re: [ANNOUNCE] New committer: Vineet Garg

2020-04-26 Thread Vineet G
Thanks a lot guys!

Just to briefly introduce myself - I work with Cloudera (Hortonworks before) on 
Hive and I am a Hive PMC member. As Stamatis noted I have been involved in 
calcite since 2017. It is great honor to be part of this community. I am very 
excited to become committer and I look forward to contributing more.

Regards,
Vineet Garg

> On Apr 26, 2020, at 2:26 PM, Jesus Camacho Rodriguez  
> wrote:
> 
> Congrats Vineet, well deserved!
> 
> -Jesús
> 
> On Sun, Apr 26, 2020 at 3:09 AM Leonard Xu  wrote:
> 
>> Congratulations, Vineet!
>> 
>> Best,
>> Leonard Xu
>>> 在 2020年4月26日,18:07,xu  写道:
>>> 
>>> Congrats, Vineet!
>>> 
>>> Danny Chan  于2020年4月26日周日 下午4:52写道:
>>> 
 Congrats, Vineet!
 
 Best,
 Danny Chan
 在 2020年4月26日 +0800 PM1:55,dev@calcite.apache.org,写道:
> 
> Congrats, Vineet!
 
>>> 
>>> 
>>> --
>>> 
>>> Best regards,
>>> 
>>> Xu
>> 
>> 



Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Stamatis Zampetakis
Hi all,

I am very excited about the ideas discussed so far and especially by the
enthusiasm of many people that are ready to help for pulling this out.
I wouldn't except that we could have a prototype so quickly.
Thanks a lot everyone!

In the debate between creating new planner or patching the existing one, I
don't have a clear preference.
I think the answer depends on how many things can we reuse.
If in the new planner we end up copy-pasting code then I guess it will be a
bad idea.
On the other hand, if the new and old planner do not have many things in
common then I guess the answer is obvious.

>From Haisheng's description, I was thinking that many of the proposed
changes could go in the existing planner.
Like that it will be easier for everybody to test and see if the changes
make this better or worse.
>From a backward compatibility perspective it seems feasible to keep the new
features configurable for a certain amount of time.

>From the wish-list, I think we should focus initially on points:
1. Top-down trait request
2. Convert traits without Abstract converters
4. Bottom-up trait derivation

I know that 3, and 5, are also important but I have the feeling they can
wait a bit longer.

A design doc would definitely help, especially if it has a few end-to-end
(from logical to physical plan) examples showing how the optimizer works at
each step before/after the changes.
This is actually what is usually missing in research papers that makes them
hard to understand.
I am thinking some similar to the examples that Haisheng send in the first
email but possibly a bit more detailed.

I looked very briefly in the PR by Roman but I think I didn't see tests
where the final plan contains operators from multiple conventions.
Multiple conventions is among the choices that complicate certain parts of
the existing planner so we should make sure that we take this into account.

Hoping to find some time to think over all this more quietly. Very
interesting stuff :)

Best,
Stamatis

On Sun, Apr 26, 2020 at 11:14 PM Haisheng Yuan  wrote:

> Hi Roman,
>
> Excellent! This is definitely a helpful contribution to the Calcite
> community.
> Thank you for your endeavors.
>
> Haisheng
>
> On 2020/04/26 19:25:00, Roman Kondakov 
> wrote:
> > Hi everyone!
> >
> > Haisheng, thank you for bringing this subject up. A new Cascades-style
> > optimizer should be definitely the next step for Apache Calcite. Many
> > projects suffer from the lack of this kind of optimizer.
> >
> > That was the reason why several weeks ago I started working on the
> > prototype of Cascades optimizer for Apache Calcite. I was not sure that
> > I would build something useful without much experience in this area. But
> > now I'd like to share my work with the community. You can find the
> > Cascades prototype in PR [1]. This prototype is based on the well-known
> > paper [2].
> >
> > What prototype can do:
> > - Top-down trait request
> > - Convert traits without Abstract converters
> > - Top-down rule apply
> > - Bottom-up trait derivation
> >
> > What is not supported yet:
> > - Search space pruning (but I'm going to fix it in the next commits)
> > - Materialized views
> > - Planner hooks
> > - Hints
> > - Backward compatibility with Volcano planner (some research needed)
> >
> > I prepared a design doc for this planner [3], you can find many details
> > there. I also opened it for comments.
> >
> > I've written several basic test cases in
> > org.apache.calcite.plan.cascades.CascadesPlannerTest including that was
> > discussed in this thread previously in the context of trait requests:
> >
> > >  MergeJoin hash[a]
> > >   | TableScan R hash[a] (RelSubset)
> > >   + TableScan S hash[a] (RelSubset)
> >
> >
> > Haisheng, this is very similar to what you propose. Answering your
> question:
> >
> > > There are 2 ways:
> > > a) Modify on current VolcanoPlanner.
> > >   Pros: code reuse, existing numerous test cases and infrastructure,
> fast integration
> > >   Cons: changing code always brings risk
> > >
> > > b) Add a new Planner
> > >   Pros: no risk, no tech debt, no need to worry about backward
> compatability
> > >   Cons: separate test cases for new planner, one more planner to
> maintain
> >
> > I've chosen the second approach. Because I don't have clear
> > understanding how to fix Volcano planner gradually.
> >
> > This new planner is very far from perfect, but I think it can be a good
> > starting point for community.
> >
> > Please, share your thoughts about this planner.
> >
> >
> > [1] PR: https://github.com/apache/calcite/pull/1948
> > [2] Paper:
> >
> https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf
> > [3] Design doc:
> >
> https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing
> >
> >
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 22.04.2020 09:52, Danny Chan wrote:
> > >> Is there any recommended approach to make t

[jira] [Created] (CALCITE-3961) VolcanoPlanner.prunedNodes information is lost when duplicate relNode is discarded

2020-04-26 Thread Botong Huang (Jira)
Botong Huang created CALCITE-3961:
-

 Summary: VolcanoPlanner.prunedNodes information is lost when 
duplicate relNode is discarded
 Key: CALCITE-3961
 URL: https://issues.apache.org/jira/browse/CALCITE-3961
 Project: Calcite
  Issue Type: Bug
Reporter: Botong Huang


VolcanoPlanner.prunedNodes stores the list of relNodes that are marked useless. 
Whenever the planner see two identical relNode (e.g. when Relsets are merged), 
one of them are discarded. However, when the preserved node is not in the 
pruned list while the discarded one is, this pruned information is lost. In 
general, we should preserve this info whenever duplicate relNodes are 
discarded. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)