Re: Re: Re: Problem with converters and possibly rule matching

2019-11-19 Thread Vladimir Ozerov
rules calls anyway, so I changed
> it
> > > to always return true. Please note that if I change the code as you
> > > suggested, the test started failing, because bottom-up propagation of
> > rule
> > > calls no longer work: when the child is converted to physical form, the
> > > parent logical node is not notified. This is the very problem I address
> > > with that weird physical-to-logical conversions: they do not make
> sense,
> > > and converter expansion does not produce any new rels, but their
> > existence
> > > allow for logical rule re-trigger which ultimately allow the plan to
> > > compile.
> > >
> > > Regarding two conventions - I agree that it may look strange, but I do
> > not
> > > see any problems from the correctness perspective. Separation of
> logical
> > > and physical planning helps me avoid excessive expansion of the search
> > > place: I do not want my physical rules to produce new rels from not
> > > optimized logical rels. Do you see any problems with that approach?
> > >
> > > Regards,
> > > Vladimir
> > >
> > > ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan :
> > >
> > >> Hi Vladimir,
> > >>
> > >> The code in PHYSICAL convention L44 looks weird, I think it always
> > >> returns true.
> > >>
> > >>
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
> > >>
> > >> Try this:
> > >>
> > >> fromTraits.containsIfApplicable(Convention.PHYSICAL)
> > >>     && toTraits.containsIfApplicable(Convention.PHYSICAL);
> > >>
> > >>
> > >> Adding a AbstractConverter on logical operators is meaningless.
> Calcite
> > >> is mixing the concept of logical and physical together, which is sad.
> > >>
> > >> BTW, using 2 conventions is not appropriate and wrong.
> > >>
> > >> - Haisheng
> > >>
> > >> --
> > >> 发件人:Vladimir Ozerov
> > >> 日 期:2019年11月05日 18:02:15
> > >> 收件人:Haisheng Yuan
> > >> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org)<
> > >> dev@calcite.apache.org>
> > >> 主 题:Re: Re: Problem with converters and possibly rule matching
> > >>
> > >> Hi Haisheng,
> > >>
> > >>  think I already tried something very similar to what you explained,
> but
> > >> it gave not an optimal plan. Please let me describe what I did. I
> would
> > >> appreciate your feedback.
> > >>
> > >> 1) We start with a simple operator tree Root <- Project <- Scan, where
> > >> the root is a final aggregator in the distributed query engine:
> > >> -> LogicalRoot
> > >>  -> LogicalProject
> > >>   -> LogicalScan
> > >>
> > >> 2) First, we convert the Root and enforce SINGLETON distribution on a
> > >> child:
> > >> *-> PhysicalRoot[SINGLETON]*
> > >> * -> Enforcer#1[SINGLETON]*
> > >>   -> LogicalProject
> > >>-> LogicalScan
> > >>
> > >> 3) Then the project's rule is invoked. It doesn't know the
> distribution
> > >> of the input, so it requests ANY distribution. Note that we have to
> set
> > ANY
> > >> to the project as well since we do not know the distribution of the
> > input:
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >> *  -> PhysicalProject[ANY]*
> > >> *   -> Enforcer#2[ANY]*
> > >> -> LogicalScan
> > >>
> > >> 4) Finally, the physical scan is created and its distribution is
> > >> resolved. Suppose that it is REPLICATED, i.e. the whole result set is
> > >> located on all nodes.
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >>   -> PhysicalProject[ANY]
> > >>-> Enforcer#2[ANY]
> > >> *-> PhysicalScan[REPLICATED]*
> > >>
> > >> 5) Now as all logical nodes are converted, we start resolving
> enforcers.
> > >> The second one is no-op, since REPLICATED satisfies ANY:
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >>   -> PhysicalProject[ANY]
> > >&g

Re: Re: Re: Problem with converters and possibly rule matching

2019-11-15 Thread Stamatis Zampetakis
Hi again Vladimir,

In my previous email, I mentioned three rules for performing logical to
physical conversions.
It's normal that if you add more operators we will end up with more rules.
Now that the example has a filter we have the following rule list:

Rule 1: RootLogicalRel -> RootPhysicalRel
Rule 2: ProjectLogicalRel -> ProjectPhysicalRel
Rule 3: FilterLogicalRel -> FilterPhysicalRel
Rule 4: MapScanLogicalRel -> MapScanPhysicalRel

If I understand well, your concern is that in order to get the optimal plan
you would have to introduce many additional rules for the Exchange operator
(e.g., ExchangeProjectTransposeRule, ExchangeFilterTransposeRue, etc.).
I was thinking that it is not necessary to introduce these kind of rules.
You could have only one rule, i.e., PhysicaExchangeRule, for enforcing a
distribution that would be applied when there is a requirement for
particular distribution.

So for the initial plan
LogicalRoot [distribution=SINGLETON]
-> LogicalProject
 -> LogicalFilter
  -> LogicalScan

The ruleset above should generate the following alternatives where the
PhysicalExchangeRule is applied 3 times.

Alt1:
PhysicalRoot
-> SingletonExchange
 -> PhysicalProject
  -> PhysicalFilter
   -> PhysicalScan

Alt2:
PhysicalRoot
-> PhysicalProject
 -> SingletonExchange
  -> PhysicalFilter
   -> PhysicalScan

Alt3:
PhysicalRoot
-> PhysicalProject
 -> PhysicalFilter
  -> SingletonExchange
   -> PhysicalScan

This is still in the same spirit of the previous example where the physical
property (distribution) is either satisfied immediately or passed down to
the next level.

Best,
Stamatis

On Thu, Nov 14, 2019 at 11:14 AM Vladimir Ozerov  wrote:

> search place* = search space
>
> чт, 14 нояб. 2019 г. в 13:10, Vladimir Ozerov :
>
> > Hi Haisheng,
> >
> > I double-checked the code. My original version returned false for some
> > cases, but it didn't affect number of rules calls anyway, so I changed it
> > to always return true. Please note that if I change the code as you
> > suggested, the test started failing, because bottom-up propagation of
> rule
> > calls no longer work: when the child is converted to physical form, the
> > parent logical node is not notified. This is the very problem I address
> > with that weird physical-to-logical conversions: they do not make sense,
> > and converter expansion does not produce any new rels, but their
> existence
> > allow for logical rule re-trigger which ultimately allow the plan to
> > compile.
> >
> > Regarding two conventions - I agree that it may look strange, but I do
> not
> > see any problems from the correctness perspective. Separation of logical
> > and physical planning helps me avoid excessive expansion of the search
> > place: I do not want my physical rules to produce new rels from not
> > optimized logical rels. Do you see any problems with that approach?
> >
> > Regards,
> > Vladimir
> >
> > ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan :
> >
> >> Hi Vladimir,
> >>
> >> The code in PHYSICAL convention L44 looks weird, I think it always
> >> returns true.
> >>
> >>
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
> >>
> >> Try this:
> >>
> >> fromTraits.containsIfApplicable(Convention.PHYSICAL)
> >> && toTraits.containsIfApplicable(Convention.PHYSICAL);
> >>
> >>
> >> Adding a AbstractConverter on logical operators is meaningless. Calcite
> >> is mixing the concept of logical and physical together, which is sad.
> >>
> >> BTW, using 2 conventions is not appropriate and wrong.
> >>
> >> - Haisheng
> >>
> >> --
> >> 发件人:Vladimir Ozerov
> >> 日 期:2019年11月05日 18:02:15
> >> 收件人:Haisheng Yuan
> >> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org)<
> >> dev@calcite.apache.org>
> >> 主 题:Re: Re: Problem with converters and possibly rule matching
> >>
> >> Hi Haisheng,
> >>
> >>  think I already tried something very similar to what you explained, but
> >> it gave not an optimal plan. Please let me describe what I did. I would
> >> appreciate your feedback.
> >>
> >> 1) We start with a simple operator tree Root <- Project <- Scan, where
> >> the root is a final aggregator in the distributed query engine:
> >> -> LogicalRoot
> >>  -> LogicalProject
> >>   -> LogicalScan
> >>
> >> 2) First, we convert the Roo

Re: Re: Re: Problem with converters and possibly rule matching

2019-11-14 Thread Vladimir Ozerov
search place* = search space

чт, 14 нояб. 2019 г. в 13:10, Vladimir Ozerov :

> Hi Haisheng,
>
> I double-checked the code. My original version returned false for some
> cases, but it didn't affect number of rules calls anyway, so I changed it
> to always return true. Please note that if I change the code as you
> suggested, the test started failing, because bottom-up propagation of rule
> calls no longer work: when the child is converted to physical form, the
> parent logical node is not notified. This is the very problem I address
> with that weird physical-to-logical conversions: they do not make sense,
> and converter expansion does not produce any new rels, but their existence
> allow for logical rule re-trigger which ultimately allow the plan to
> compile.
>
> Regarding two conventions - I agree that it may look strange, but I do not
> see any problems from the correctness perspective. Separation of logical
> and physical planning helps me avoid excessive expansion of the search
> place: I do not want my physical rules to produce new rels from not
> optimized logical rels. Do you see any problems with that approach?
>
> Regards,
> Vladimir
>
> ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan :
>
>> Hi Vladimir,
>>
>> The code in PHYSICAL convention L44 looks weird, I think it always
>> returns true.
>>
>> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
>>
>> Try this:
>>
>> fromTraits.containsIfApplicable(Convention.PHYSICAL)
>> && toTraits.containsIfApplicable(Convention.PHYSICAL);
>>
>>
>> Adding a AbstractConverter on logical operators is meaningless. Calcite
>> is mixing the concept of logical and physical together, which is sad.
>>
>> BTW, using 2 conventions is not appropriate and wrong.
>>
>> - Haisheng
>>
>> ------------------
>> 发件人:Vladimir Ozerov
>> 日 期:2019年11月05日 18:02:15
>> 收件人:Haisheng Yuan
>> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org)<
>> dev@calcite.apache.org>
>> 主 题:Re: Re: Problem with converters and possibly rule matching
>>
>> Hi Haisheng,
>>
>>  think I already tried something very similar to what you explained, but
>> it gave not an optimal plan. Please let me describe what I did. I would
>> appreciate your feedback.
>>
>> 1) We start with a simple operator tree Root <- Project <- Scan, where
>> the root is a final aggregator in the distributed query engine:
>> -> LogicalRoot
>>  -> LogicalProject
>>   -> LogicalScan
>>
>> 2) First, we convert the Root and enforce SINGLETON distribution on a
>> child:
>> *-> PhysicalRoot[SINGLETON]*
>> * -> Enforcer#1[SINGLETON]*
>>   -> LogicalProject
>>-> LogicalScan
>>
>> 3) Then the project's rule is invoked. It doesn't know the distribution
>> of the input, so it requests ANY distribution. Note that we have to set ANY
>> to the project as well since we do not know the distribution of the input:
>> -> PhysicalRoot[SINGLETON]
>>  -> Enforcer#1[SINGLETON]
>> *  -> PhysicalProject[ANY]*
>> *   -> Enforcer#2[ANY]*
>> -> LogicalScan
>>
>> 4) Finally, the physical scan is created and its distribution is
>> resolved. Suppose that it is REPLICATED, i.e. the whole result set is
>> located on all nodes.
>> -> PhysicalRoot[SINGLETON]
>>  -> Enforcer#1[SINGLETON]
>>   -> PhysicalProject[ANY]
>>-> Enforcer#2[ANY]
>> *-> PhysicalScan[REPLICATED]*
>>
>> 5) Now as all logical nodes are converted, we start resolving enforcers.
>> The second one is no-op, since REPLICATED satisfies ANY:
>> -> PhysicalRoot[SINGLETON]
>>  -> Enforcer#1[SINGLETON]
>>   -> PhysicalProject[ANY]
>>-> PhysicalScan[REPLICATED]
>>
>> 6) But the first enforcer now requires an Exchange, since ANY doesn't
>> satisfy SINGLETON!
>> -> PhysicalRoot[SINGLETON]
>> * -> SingletonExchange[SINGLETON]*
>>   -> PhysicalProject[ANY] // <= unresolved!
>>-> PhysicalScan[REPLICATED]
>>
>> The resulting plan requires data movement only because we didn't know
>> precise distribution of the PhysicalProject when it was created. But should
>> I enable Convention.Impl.canConvertConvention, bottom-up propagation
>> kicks in, and the correct plan is produced because now LogicalProject
>> has a chance to be converted to PhysicalProject with the concrete
>> distribution. The o

Re: Re: Re: Problem with converters and possibly rule matching

2019-11-14 Thread Vladimir Ozerov
Hi Haisheng,

I double-checked the code. My original version returned false for some
cases, but it didn't affect number of rules calls anyway, so I changed it
to always return true. Please note that if I change the code as you
suggested, the test started failing, because bottom-up propagation of rule
calls no longer work: when the child is converted to physical form, the
parent logical node is not notified. This is the very problem I address
with that weird physical-to-logical conversions: they do not make sense,
and converter expansion does not produce any new rels, but their existence
allow for logical rule re-trigger which ultimately allow the plan to
compile.

Regarding two conventions - I agree that it may look strange, but I do not
see any problems from the correctness perspective. Separation of logical
and physical planning helps me avoid excessive expansion of the search
place: I do not want my physical rules to produce new rels from not
optimized logical rels. Do you see any problems with that approach?

Regards,
Vladimir

ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan :

> Hi Vladimir,
>
> The code in PHYSICAL convention L44 looks weird, I think it always returns
> true.
>
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
>
> Try this:
>
> fromTraits.containsIfApplicable(Convention.PHYSICAL)
> && toTraits.containsIfApplicable(Convention.PHYSICAL);
>
>
> Adding a AbstractConverter on logical operators is meaningless. Calcite is
> mixing the concept of logical and physical together, which is sad.
>
> BTW, using 2 conventions is not appropriate and wrong.
>
> - Haisheng
>
> --
> 发件人:Vladimir Ozerov
> 日 期:2019年11月05日 18:02:15
> 收件人:Haisheng Yuan
> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org) >
> 主 题:Re: Re: Problem with converters and possibly rule matching
>
> Hi Haisheng,
>
>  think I already tried something very similar to what you explained, but
> it gave not an optimal plan. Please let me describe what I did. I would
> appreciate your feedback.
>
> 1) We start with a simple operator tree Root <- Project <- Scan, where
> the root is a final aggregator in the distributed query engine:
> -> LogicalRoot
>  -> LogicalProject
>   -> LogicalScan
>
> 2) First, we convert the Root and enforce SINGLETON distribution on a
> child:
> *-> PhysicalRoot[SINGLETON]*
> * -> Enforcer#1[SINGLETON]*
>   -> LogicalProject
>-> LogicalScan
>
> 3) Then the project's rule is invoked. It doesn't know the distribution of
> the input, so it requests ANY distribution. Note that we have to set ANY to
> the project as well since we do not know the distribution of the input:
> -> PhysicalRoot[SINGLETON]
>  -> Enforcer#1[SINGLETON]
> *  -> PhysicalProject[ANY]*
> *   -> Enforcer#2[ANY]*
> -> LogicalScan
>
> 4) Finally, the physical scan is created and its distribution is resolved.
> Suppose that it is REPLICATED, i.e. the whole result set is located on all
> nodes.
> -> PhysicalRoot[SINGLETON]
>  -> Enforcer#1[SINGLETON]
>   -> PhysicalProject[ANY]
>-> Enforcer#2[ANY]
> *-> PhysicalScan[REPLICATED]*
>
> 5) Now as all logical nodes are converted, we start resolving enforcers.
> The second one is no-op, since REPLICATED satisfies ANY:
> -> PhysicalRoot[SINGLETON]
>  -> Enforcer#1[SINGLETON]
>   -> PhysicalProject[ANY]
>-> PhysicalScan[REPLICATED]
>
> 6) But the first enforcer now requires an Exchange, since ANY doesn't
> satisfy SINGLETON!
> -> PhysicalRoot[SINGLETON]
> * -> SingletonExchange[SINGLETON]*
>   -> PhysicalProject[ANY] // <= unresolved!
>-> PhysicalScan[REPLICATED]
>
> The resulting plan requires data movement only because we didn't know
> precise distribution of the PhysicalProject when it was created. But should
> I enable Convention.Impl.canConvertConvention, bottom-up propagation
> kicks in, and the correct plan is produced because now LogicalProject has
> a chance to be converted to PhysicalProject with the concrete
> distribution. The optimized plan looks like this (since REPLICATED
> satisfies SINGLETON):
> -> PhysicalRoot[SINGLETON]
>  -> PhysicalProject[REPLICATED]
>   -> PhysicalScan[REPLICATED]
>
> You may see this in action in my reproducer:
> 1) Test producing "bad" plan:
> https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
> 2) Root enforces SINGLETON on Project:
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
> 3) 

Re: Re: Re: Problem with converters and possibly rule matching

2019-11-05 Thread Haisheng Yuan
Hi Vladimir,

The code in PHYSICAL convention L44 looks weird, I think it always returns true.
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44

Try this:
fromTraits.containsIfApplicable(Convention.PHYSICAL)
&& toTraits.containsIfApplicable(Convention.PHYSICAL);

Adding a AbstractConverter on logical operators is meaningless. Calcite is 
mixing the concept of logical and physical together, which is sad.

BTW, using 2 conventions is not appropriate and wrong.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年11月05日 18:02:15
收件人:Haisheng Yuan
抄 送:dev@calcite.apache.org (dev@calcite.apache.org)
主 题:Re: Re: Problem with converters and possibly rule matching

Hi Haisheng,

 think I already tried something very similar to what you explained, but it 
gave not an optimal plan. Please let me describe what I did. I would appreciate 
your feedback. 

1) We start with a simple operator tree Root <- Project <- Scan, where the root 
is a final aggregator in the distributed query engine:
-> LogicalRoot
 -> LogicalProject
  -> LogicalScan

2) First, we convert the Root and enforce SINGLETON distribution on a child:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> LogicalProject
   -> LogicalScan

3) Then the project's rule is invoked. It doesn't know the distribution of the 
input, so it requests ANY distribution. Note that we have to set ANY to the 
project as well since we do not know the distribution of the input:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
-> LogicalScan

4) Finally, the physical scan is created and its distribution is resolved. 
Suppose that it is REPLICATED, i.e. the whole result set is located on all 
nodes.
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
-> PhysicalScan[REPLICATED]

5) Now as all logical nodes are converted, we start resolving enforcers. The 
second one is no-op, since REPLICATED satisfies ANY:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> PhysicalScan[REPLICATED]

6) But the first enforcer now requires an Exchange, since ANY doesn't satisfy 
SINGLETON!
-> PhysicalRoot[SINGLETON]
 -> SingletonExchange[SINGLETON]
  -> PhysicalProject[ANY] // <= unresolved!
   -> PhysicalScan[REPLICATED]

The resulting plan requires data movement only because we didn't know precise 
distribution of the PhysicalProject when it was created. But should I enable 
Convention.Impl.canConvertConvention, bottom-up propagation kicks in, and the 
correct plan is produced because now LogicalProject has a chance to be 
converted to PhysicalProject with the concrete distribution. The optimized plan 
looks like this (since REPLICATED satisfies SINGLETON):
-> PhysicalRoot[SINGLETON]
 -> PhysicalProject[REPLICATED]
  -> PhysicalScan[REPLICATED]
You may see this in action in my reproducer:
1) Test producing "bad" plan: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
2) Root enforces SINGLETON on Project: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
3) Project enforces default (ANY) distribution on Scan: 
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49

Please let me know if this flow is similar to what you meant.

Regards,
Vladimir.
пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan :

Hi Vladimir,

This is still can be done through top-down request approach.

PhysicalFilter operator should request ANY distribution from child operator, 
unless there is outer reference in the filter condition, in which case, 
PhysicalFilter should request SINGLETON or BROADCAST distribution. So in your 
case, PhysicalFilter request ANY, its required distribution will be enforced on 
filter's output.

Regarding index usage, you should have a FIlterTableScan2IndexGet logical 
transformation rule, and a IndexGet2IndexScan physical implementation rule. 
Note that IndexGet is a logical operator and IndexScan is a physical operator, 
which are also used by SQL Server.

- Haisheng

------------------
发件人:Vladimir Ozerov
日 期:2019年11月01日 17:30:26
收件人:
主 题:Re: Problem with converters and possibly rule matching

Hi Stamatis,

Thank you for your reply. I also thought that we may set the distribution
trait during logical planning because it is known in advance. And the
example I gave will work! :-) But unfortunately, it will work only because
the tree is very simple, and the Project is adjacent to the Scan. This is
how my reproducer will work in that case:
1) Root: enforce "SINGLETON" on Project

Re: Re: Problem with converters and possibly rule matching

2019-11-05 Thread Vladimir Ozerov
Hi Haisheng,

 think I already tried something very similar to what you explained, but it
gave not an optimal plan. Please let me describe what I did. I would
appreciate your feedback.

1) We start with a simple operator tree Root <- Project <- Scan, where the
root is a final aggregator in the distributed query engine:
-> LogicalRoot
 -> LogicalProject
  -> LogicalScan

2) First, we convert the Root and enforce SINGLETON distribution on a child:
*-> PhysicalRoot[SINGLETON]*
* -> Enforcer#1[SINGLETON]*
  -> LogicalProject
   -> LogicalScan

3) Then the project's rule is invoked. It doesn't know the distribution of
the input, so it requests ANY distribution. Note that we have to set ANY to
the project as well since we do not know the distribution of the input:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
*  -> PhysicalProject[ANY]*
*   -> Enforcer#2[ANY]*
-> LogicalScan

4) Finally, the physical scan is created and its distribution is resolved.
Suppose that it is REPLICATED, i.e. the whole result set is located on all
nodes.
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> Enforcer#2[ANY]
*-> PhysicalScan[REPLICATED]*

5) Now as all logical nodes are converted, we start resolving enforcers.
The second one is no-op, since REPLICATED satisfies ANY:
-> PhysicalRoot[SINGLETON]
 -> Enforcer#1[SINGLETON]
  -> PhysicalProject[ANY]
   -> PhysicalScan[REPLICATED]

6) But the first enforcer now requires an Exchange, since ANY doesn't
satisfy SINGLETON!
-> PhysicalRoot[SINGLETON]
* -> SingletonExchange[SINGLETON]*
  -> PhysicalProject[ANY] // <= unresolved!
   -> PhysicalScan[REPLICATED]

The resulting plan requires data movement only because we didn't know
precise distribution of the PhysicalProject when it was created. But should
I enable Convention.Impl.canConvertConvention, bottom-up propagation kicks
in, and the correct plan is produced because now LogicalProject has a
chance to be converted to PhysicalProject with the concrete distribution.
The optimized plan looks like this (since REPLICATED satisfies SINGLETON):
-> PhysicalRoot[SINGLETON]
 -> PhysicalProject[REPLICATED]
  -> PhysicalScan[REPLICATED]

You may see this in action in my reproducer:
1) Test producing "bad" plan:
https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
2) Root enforces SINGLETON on Project:
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
3) Project enforces default (ANY) distribution on Scan:
https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49

Please let me know if this flow is similar to what you meant.

Regards,
Vladimir.

пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan :

> Hi Vladimir,
>
> This is still can be done through top-down request approach.
>
> PhysicalFilter operator should request ANY distribution from child
> operator, unless there is outer reference in the filter condition, in which
> case, PhysicalFilter should request SINGLETON or BROADCAST distribution. So
> in your case, PhysicalFilter request ANY, its required distribution will be
> enforced on filter's output.
>
> Regarding index usage, you should have a FIlterTableScan2IndexGet logical
> transformation rule, and a IndexGet2IndexScan physical implementation rule.
> Note that IndexGet is a logical operator and IndexScan is a physical
> operator, which are also used by SQL Server.
>
> - Haisheng
>
> --
> 发件人:Vladimir Ozerov
> 日 期:2019年11月01日 17:30:26
> 收件人:
> 主 题:Re: Problem with converters and possibly rule matching
>
> Hi Stamatis,
>
> Thank you for your reply. I also thought that we may set the distribution
> trait during logical planning because it is known in advance. And the
> example I gave will work! :-) But unfortunately, it will work only because
> the tree is very simple, and the Project is adjacent to the Scan. This is
> how my reproducer will work in that case:
> 1) Root: enforce "SINGLETON" on Project
> 2) Project: check the logical Scan, infer already resolved distribution,
> then convert to [PhyiscalProject <- PhysicalScan]
> 3) Resolve Root enforcer, adding and Exchange if needed.
>
> But this stops working as soon as a plan becomes more complex so that it is
> impossible to infer the distribution from the child immediately. E.g.:
> LogicalRoot [distribution=SINGLETON]
> -> LogicalProject // We are here new and cannot produce the physical
> project
> -> LogicalFilter[distribution=?]
> -> LogicalScan[distribution=REPLICATED]
>
> This is where your suggestion with cascading enforcement may kick in. B

Re: Problem with converters and possibly rule matching

2019-11-04 Thread Julian Hyde
My definition of a physical property is one that can be changed by a converter. 
For example, the Ordering physical property can be changed by the Sort 
operator. And the Distribution physical property can be changed by the Exchange 
operator. 

Particular physical properties might be baked into the schema — e.g. the fact 
that table T is partitioned on X and sorted by Y — but doesn’t stop them from 
being physical properties; it just saves the effort of sorting/partitioning 
when you execute the query.

Suppose that we have a date column. It could be represented as a string, date, 
or integer. There is the same information content in each, and hence there are 
lossless conversions among those representations. So, I’d say that the 
representation of that column, even though baked into the schema, is a physical 
property too.

Julian


> On Nov 3, 2019, at 11:33 PM, Haisheng Yuan  wrote:
> 
> Hi Vladimir,
> 
> This is still can be done through top-down request approach.
> 
> PhysicalFilter operator should request ANY distribution from child operator, 
> unless there is outer reference in the filter condition, in which case, 
> PhysicalFilter should request SINGLETON or BROADCAST distribution. So in your 
> case, PhysicalFilter request ANY, its required distribution will be enforced 
> on filter's output.
> 
> Regarding index usage, you should have a FIlterTableScan2IndexGet logical 
> transformation rule, and a IndexGet2IndexScan physical implementation rule. 
> Note that IndexGet is a logical operator and IndexScan is a physical 
> operator, which are also used by SQL Server.
> 
> - Haisheng
> 
> --
> 发件人:Vladimir Ozerov
> 日 期:2019年11月01日 17:30:26
> 收件人:
> 主 题:Re: Problem with converters and possibly rule matching
> 
> Hi Stamatis,
> 
> Thank you for your reply. I also thought that we may set the distribution
> trait during logical planning because it is known in advance. And the
> example I gave will work! :-) But unfortunately, it will work only because
> the tree is very simple, and the Project is adjacent to the Scan. This is
> how my reproducer will work in that case:
> 1) Root: enforce "SINGLETON" on Project
> 2) Project: check the logical Scan, infer already resolved distribution,
> then convert to [PhyiscalProject <- PhysicalScan]
> 3) Resolve Root enforcer, adding and Exchange if needed.
> 
> But this stops working as soon as a plan becomes more complex so that it is
> impossible to infer the distribution from the child immediately. E.g.:
> LogicalRoot [distribution=SINGLETON]
> -> LogicalProject // We are here new and cannot produce the physical project
> -> LogicalFilter[distribution=?]
> -> LogicalScan[distribution=REPLICATED]
> 
> This is where your suggestion with cascading enforcement may kick in. But
> now consider that instead of having REPLICATED distribution, which
> satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't satisfy
> SINGLETON, so we have to insert the exchange.
> 
> Before:
> PhysicalRoot [SINGLETON]
> -> PhysicalProject [SINGLETON]
> -> PhysicalFilter [SINGLETON]
> -> LogicalScan [PARTITIONED] // SINGLETON is enforced here
> 
> After:
> PhysicalRoot [SINGLETON]
> -> PhysicalProject [SINGLETON]
> -> PhysicalFilter [SINGLETON]
> -> SingletonExchange [SINGLETON]
> -> PhysicalScan [PARTITIONED]
> 
> But unfortunately, this plan is not optimal. Since we perform Project and
> Filter after the exchange, we now have to reshuffle the whole table. The
> optimal plan would be to pushdown Project and Filter past exchange:
> PhysicalRoot [SINGLETON]
> -> SingletonExchange [SINGLETON]
> -> PhysicalProject [PARTITIONED]
> -> PhysicalFilter [PARTITIONED]
> -> PhysicalScan [PARTITIONED]
> 
> But in order to achieve that, now I need to introduce new rules which will
> attempt to transpose dozens of different operators past exchange, so most
> likely the planning will never finish :-)
> 
> This is why it seems that the bottom-up approach seems to be more natural
> for such cases. Another example is index support. A table may have a number
> of access methods in addition to plain scan, and every such method may
> produce a physical scan with different collations. It is not practical to
> try to enforce something from the top. Instead, we'd better produce
> alternatives from the bottom.
> 
> Basically, Drill tries to mix two approaches. First, it tries to derive
> traits from the child. If it fails and no transformations were created, it
> just stops trait propagation. As a result, an exchange is created on top of
> the operator where we stopped, which again leads to not optimal plans. You
> may observe 

Re: Re: Problem with converters and possibly rule matching

2019-11-03 Thread Haisheng Yuan
Hi Vladimir,

This is still can be done through top-down request approach.

PhysicalFilter operator should request ANY distribution from child operator, 
unless there is outer reference in the filter condition, in which case, 
PhysicalFilter should request SINGLETON or BROADCAST distribution. So in your 
case, PhysicalFilter request ANY, its required distribution will be enforced on 
filter's output.

Regarding index usage, you should have a FIlterTableScan2IndexGet logical 
transformation rule, and a IndexGet2IndexScan physical implementation rule. 
Note that IndexGet is a logical operator and IndexScan is a physical operator, 
which are also used by SQL Server.

- Haisheng

--
发件人:Vladimir Ozerov
日 期:2019年11月01日 17:30:26
收件人:
主 题:Re: Problem with converters and possibly rule matching

Hi Stamatis,

Thank you for your reply. I also thought that we may set the distribution
trait during logical planning because it is known in advance. And the
example I gave will work! :-) But unfortunately, it will work only because
the tree is very simple, and the Project is adjacent to the Scan. This is
how my reproducer will work in that case:
1) Root: enforce "SINGLETON" on Project
2) Project: check the logical Scan, infer already resolved distribution,
then convert to [PhyiscalProject <- PhysicalScan]
3) Resolve Root enforcer, adding and Exchange if needed.

But this stops working as soon as a plan becomes more complex so that it is
impossible to infer the distribution from the child immediately. E.g.:
LogicalRoot [distribution=SINGLETON]
-> LogicalProject // We are here new and cannot produce the physical project
 -> LogicalFilter[distribution=?]
 -> LogicalScan[distribution=REPLICATED]

This is where your suggestion with cascading enforcement may kick in. But
now consider that instead of having REPLICATED distribution, which
satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't satisfy
SINGLETON, so we have to insert the exchange.

Before:
PhysicalRoot [SINGLETON]
-> PhysicalProject [SINGLETON]
 -> PhysicalFilter [SINGLETON]
 -> LogicalScan [PARTITIONED] // SINGLETON is enforced here

After:
PhysicalRoot [SINGLETON]
-> PhysicalProject [SINGLETON]
 -> PhysicalFilter [SINGLETON]
 -> SingletonExchange [SINGLETON]
 -> PhysicalScan [PARTITIONED]

But unfortunately, this plan is not optimal. Since we perform Project and
Filter after the exchange, we now have to reshuffle the whole table. The
optimal plan would be to pushdown Project and Filter past exchange:
PhysicalRoot [SINGLETON]
-> SingletonExchange [SINGLETON]
 -> PhysicalProject [PARTITIONED]
 -> PhysicalFilter [PARTITIONED]
 -> PhysicalScan [PARTITIONED]

But in order to achieve that, now I need to introduce new rules which will
attempt to transpose dozens of different operators past exchange, so most
likely the planning will never finish :-)

This is why it seems that the bottom-up approach seems to be more natural
for such cases. Another example is index support. A table may have a number
of access methods in addition to plain scan, and every such method may
produce a physical scan with different collations. It is not practical to
try to enforce something from the top. Instead, we'd better produce
alternatives from the bottom.

Basically, Drill tries to mix two approaches. First, it tries to derive
traits from the child. If it fails and no transformations were created, it
just stops trait propagation. As a result, an exchange is created on top of
the operator where we stopped, which again leads to not optimal plans. You
may observe this in action if you run my reproducer. "testPass" produces an
optimal plan with help of abstract converters. "testDrill" produces not
optimal plan with the unnecessary exchange.

All in all, I cannot get how we can employ distribution metadata and index
metadata for optimal planning without using a pure bottom-up approach.

Regards,
Vladimir.

пт, 1 нояб. 2019 г. в 11:42, Stamatis Zampetakis :

> The discussion is interesting thanks for the nice examples.
> I am replying in this thread since it has all the examples and the history
> of the conversation.
>
> *Part I: Distribution physical or logical property*
>
> The Volcano paper writes the following regarding properties:
>
> "Logical properties can be derived from the logical algebra expression and
> include schema, expected size, etc., while physical properties depend on
> algorithms, e.g., sort order, partitioning, etc."
>
> We could say that partitioning is not only a property which depends on an
> algorithm but a property which depends on the schema. When you have a table
> you know in advance that the particular table is replicated.
>
> Basically, this means that the starting "logical" plan in your case could
> be the following:
> RootLogicalRel[co

Re: Problem with converters and possibly rule matching

2019-11-01 Thread Vladimir Ozerov
Hi Stamatis,

Thank you for your reply. I also thought that we may set the distribution
trait during logical planning because it is known in advance. And the
example I gave will work! :-) But unfortunately, it will work only because
the tree is very simple, and the Project is adjacent to the Scan. This is
how my reproducer will work in that case:
1) Root: enforce "SINGLETON" on Project
2) Project: check the logical Scan, infer already resolved distribution,
then convert to [PhyiscalProject <- PhysicalScan]
3) Resolve Root enforcer, adding and Exchange if needed.

But this stops working as soon as a plan becomes more complex so that it is
impossible to infer the distribution from the child immediately. E.g.:
LogicalRoot [distribution=SINGLETON]
-> LogicalProject // We are here new and cannot produce the physical project
 -> LogicalFilter[distribution=?]
  -> LogicalScan[distribution=REPLICATED]

This is where your suggestion with cascading enforcement may kick in. But
now consider that instead of having REPLICATED distribution, which
satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't satisfy
SINGLETON, so we have to insert the exchange.

Before:
PhysicalRoot   [SINGLETON]
-> PhysicalProject [SINGLETON]
 -> PhysicalFilter [SINGLETON]
  -> LogicalScan   [PARTITIONED] // SINGLETON is enforced here

After:
PhysicalRoot   [SINGLETON]
-> PhysicalProject [SINGLETON]
 -> PhysicalFilter [SINGLETON]
  -> SingletonExchange [SINGLETON]
   -> PhysicalScan [PARTITIONED]

But unfortunately, this plan is not optimal. Since we perform Project and
Filter after the exchange, we now have to reshuffle the whole table. The
optimal plan would be to pushdown Project and Filter past exchange:
PhysicalRoot [SINGLETON]
-> SingletonExchange [SINGLETON]
 -> PhysicalProject  [PARTITIONED]
  -> PhysicalFilter  [PARTITIONED]
   -> PhysicalScan   [PARTITIONED]

But in order to achieve that, now I need to introduce new rules which will
attempt to transpose dozens of different operators past exchange, so most
likely the planning will never finish :-)

This is why it seems that the bottom-up approach seems to be more natural
for such cases. Another example is index support. A table may have a number
of access methods in addition to plain scan, and every such method may
produce a physical scan with different collations. It is not practical to
try to enforce something from the top. Instead, we'd better produce
alternatives from the bottom.

Basically, Drill tries to mix two approaches. First, it tries to derive
traits from the child. If it fails and no transformations were created, it
just stops trait propagation. As a result, an exchange is created on top of
the operator where we stopped, which again leads to not optimal plans. You
may observe this in action if you run my reproducer. "testPass" produces an
optimal plan with help of abstract converters. "testDrill" produces not
optimal plan with the unnecessary exchange.

All in all, I cannot get how we can employ distribution metadata and index
metadata for optimal planning without using a pure bottom-up approach.

Regards,
Vladimir.

пт, 1 нояб. 2019 г. в 11:42, Stamatis Zampetakis :

> The discussion is interesting thanks for the nice examples.
> I am replying in this thread since it has all the examples and the history
> of the conversation.
>
> *Part I: Distribution physical or logical property*
>
> The Volcano paper writes the following regarding properties:
>
> "Logical properties can be derived from the logical algebra expression and
> include schema, expected size, etc., while physical properties depend on
> algorithms, e.g., sort order, partitioning, etc."
>
> We could say that partitioning is not only a property which depends on an
> algorithm but a property which depends on the schema. When you have a table
> you know in advance that the particular table is replicated.
>
> Basically, this means that the starting "logical" plan in your case could
> be the following:
> RootLogicalRel[convention=LOGICAL, distribution=ANY]
> -> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
>   -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED]
>
> When you are about to create your physical projection it is not necessary
> to create three equivalent paths but the most promising by asking the input
> what can it offer as traits. Again this partially overlaps with the
> discussion about "On demand traitset" [1] where as I wrote there is already
> a mechanism to derive traits from children (collation, distribution, etc.).
>
> Note that I am not saying that there is nothing more to be done to improve
> the Calcite optimizer; I am just trying to provide a viable alternative
> given the current situation of the project.
>
> *Part II: Example with the original Volcano optimizer*
>
> Now let me try to show how the original (from the paper) Volcano optimizer
> could handle your query.
>
> The optimization starts and we request two 

Re: Problem with converters and possibly rule matching

2019-11-01 Thread Seliverstov Igor
Stamatis,

Small comment from me, for distributed systems is more efficient if
SINGLETON distribution appears only on root node, in other words as late as
possible, it allows to filter or cut rows (project) on source nodes which
significantly reduces network costs.

To do that (In case we use original volcano approach) we need to request
all possible distribution traits on project node (some of them we cannot
request, because we don't know whether a repartitioning and rehashing
happens in the middle) to provide a place for possible converted inputs
(possibly leaving empty subsets, which causes exceptions in current
implementation). Such approach significant slowdowns planning phase.

Regards,
Igor

пт, 1 нояб. 2019 г., 11:42 Stamatis Zampetakis :

> The discussion is interesting thanks for the nice examples.
> I am replying in this thread since it has all the examples and the history
> of the conversation.
>
> *Part I: Distribution physical or logical property*
>
> The Volcano paper writes the following regarding properties:
>
> "Logical properties can be derived from the logical algebra expression and
> include schema, expected size, etc., while physical properties depend on
> algorithms, e.g., sort order, partitioning, etc."
>
> We could say that partitioning is not only a property which depends on an
> algorithm but a property which depends on the schema. When you have a table
> you know in advance that the particular table is replicated.
>
> Basically, this means that the starting "logical" plan in your case could
> be the following:
> RootLogicalRel[convention=LOGICAL, distribution=ANY]
> -> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
>   -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED]
>
> When you are about to create your physical projection it is not necessary
> to create three equivalent paths but the most promising by asking the input
> what can it offer as traits. Again this partially overlaps with the
> discussion about "On demand traitset" [1] where as I wrote there is already
> a mechanism to derive traits from children (collation, distribution, etc.).
>
> Note that I am not saying that there is nothing more to be done to improve
> the Calcite optimizer; I am just trying to provide a viable alternative
> given the current situation of the project.
>
> *Part II: Example with the original Volcano optimizer*
>
> Now let me try to show how the original (from the paper) Volcano optimizer
> could handle your query.
>
> The optimization starts and we request two properties convention and
> distribution.
>
> RootLogicalRel[convention=PHYSICAL, distribution=SINGLETON]
> -> ProjectLogicalRel
>   -> MapScanLogicalRel
>
> Rule 1: RootLogicalRel -> RootPhysicalRel applies the algorithm but cannot
> satisfy the SINGLETON property so it propagates the demand
>
> RootPhysicalRel
> -> ProjectLogicalRel [convention=PHYSICAL, distribution=SINGLETON]
>   -> MapScanLogicalRel
>
> Rule 2: ProjectLogicalRel -> ProjectPhysicalRel applies the algorithm but
> cannot satisfy the SINGLETON property so it propagates the demand
>
> RootPhysicalRel
> -> ProjectPhysicalRel
>   -> MapScanLogicalRel [convention=PHYSICAL, distribution=SINGLETON]
>
> Rule 3: MapScanLogicalRel -> MapScanPhysicalRel applies the algorithm and
> given that the table is replicated it satisfies the distribution property.
>
> RootPhysicalRel
> -> ProjectPhysicalRel
>   -> MapScanPhysicalRel
>
> So we end up with a complete plan that satisfies all properties and does
> not have redundant exchange operators.
> Moreover we didn't require all possible distributions when we applied Rule
> 2 since we just want to satisfy the SINGLETON property requested by the
> parent.
> The example above does not demonstrate the additional optimization paths
> which would be apply an enforcer (an Exchange operator to satisfy the
> SINGLETON property) at a higher level than the scan.
>
> Would this be an acceptable approach? Does it still suffer from a big
> search space?
>
> To make the analog with the actual Volcano optimizer in Calcite the thing
> that may be missing is to be able to know what trait was requested by the
> parent during the application of Rule 2.
>
> Best,
> Stamatis
>
> [1]
>
> https://lists.apache.org/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
>
> On Wed, Oct 30, 2019 at 7:24 PM Vladimir Ozerov 
> wrote:
>
> > Hi Stamatis,
> >
> > The problem that the presented reproducer is a very simplified version of
> > what is actually needed. In reality, there are several distribution
> types -
> > PARTITIONED, REPLICATED, SINGLETON. To make things worse, PARTITIONED may
> > or may not have concrete distribution fields. In theory, I can create one
> > transformation per distribution type, but that would increase plan space
> > significantly. In my sample "Root <- Project <- Scan" plan, there is no
> > room for optimization at all, we only need to convert the nodes and
> > 

Re: Problem with converters and possibly rule matching

2019-11-01 Thread Stamatis Zampetakis
The discussion is interesting thanks for the nice examples.
I am replying in this thread since it has all the examples and the history
of the conversation.

*Part I: Distribution physical or logical property*

The Volcano paper writes the following regarding properties:

"Logical properties can be derived from the logical algebra expression and
include schema, expected size, etc., while physical properties depend on
algorithms, e.g., sort order, partitioning, etc."

We could say that partitioning is not only a property which depends on an
algorithm but a property which depends on the schema. When you have a table
you know in advance that the particular table is replicated.

Basically, this means that the starting "logical" plan in your case could
be the following:
RootLogicalRel[convention=LOGICAL, distribution=ANY]
-> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
  -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED]

When you are about to create your physical projection it is not necessary
to create three equivalent paths but the most promising by asking the input
what can it offer as traits. Again this partially overlaps with the
discussion about "On demand traitset" [1] where as I wrote there is already
a mechanism to derive traits from children (collation, distribution, etc.).

Note that I am not saying that there is nothing more to be done to improve
the Calcite optimizer; I am just trying to provide a viable alternative
given the current situation of the project.

*Part II: Example with the original Volcano optimizer*

Now let me try to show how the original (from the paper) Volcano optimizer
could handle your query.

The optimization starts and we request two properties convention and
distribution.

RootLogicalRel[convention=PHYSICAL, distribution=SINGLETON]
-> ProjectLogicalRel
  -> MapScanLogicalRel

Rule 1: RootLogicalRel -> RootPhysicalRel applies the algorithm but cannot
satisfy the SINGLETON property so it propagates the demand

RootPhysicalRel
-> ProjectLogicalRel [convention=PHYSICAL, distribution=SINGLETON]
  -> MapScanLogicalRel

Rule 2: ProjectLogicalRel -> ProjectPhysicalRel applies the algorithm but
cannot satisfy the SINGLETON property so it propagates the demand

RootPhysicalRel
-> ProjectPhysicalRel
  -> MapScanLogicalRel [convention=PHYSICAL, distribution=SINGLETON]

Rule 3: MapScanLogicalRel -> MapScanPhysicalRel applies the algorithm and
given that the table is replicated it satisfies the distribution property.

RootPhysicalRel
-> ProjectPhysicalRel
  -> MapScanPhysicalRel

So we end up with a complete plan that satisfies all properties and does
not have redundant exchange operators.
Moreover we didn't require all possible distributions when we applied Rule
2 since we just want to satisfy the SINGLETON property requested by the
parent.
The example above does not demonstrate the additional optimization paths
which would be apply an enforcer (an Exchange operator to satisfy the
SINGLETON property) at a higher level than the scan.

Would this be an acceptable approach? Does it still suffer from a big
search space?

To make the analog with the actual Volcano optimizer in Calcite the thing
that may be missing is to be able to know what trait was requested by the
parent during the application of Rule 2.

Best,
Stamatis

[1]
https://lists.apache.org/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E

On Wed, Oct 30, 2019 at 7:24 PM Vladimir Ozerov  wrote:

> Hi Stamatis,
>
> The problem that the presented reproducer is a very simplified version of
> what is actually needed. In reality, there are several distribution types -
> PARTITIONED, REPLICATED, SINGLETON. To make things worse, PARTITIONED may
> or may not have concrete distribution fields. In theory, I can create one
> transformation per distribution type, but that would increase plan space
> significantly. In my sample "Root <- Project <- Scan" plan, there is no
> room for optimization at all, we only need to convert the nodes and
> propagate traits. But if I follow the proposed approach, the planner would
> create three equivalent paths. For complex queries, this may increase
> optimization time significantly.
>
> What I need instead is to gradually convert and optimize nodes *bottom-up*,
> instead of top-bottom. That is, create a physical scan first, then create a
> physical project on top of it, etc. But at the same time, some rules still
> require the top-bottom approach. So essentially I need the optimizer to do
> both. Abstract converters help me establish bottom-up preparation but do
> this at the cost of considering too many trait pairs, and as a result, also
> pollute the search space.
>
> To the contrast, precise command "I transformed the node A to node B,
> please re-trigger the rules for A's parents" would allow us to re-trigger
> only required rules, without adding more nodes.
>
> Does it make sense?
>
> Regards,
> Vladimir.
>
> ср, 30 окт. 2019 

Re: Problem with converters and possibly rule matching

2019-10-30 Thread Vladimir Ozerov
Hi Stamatis,

The problem that the presented reproducer is a very simplified version of
what is actually needed. In reality, there are several distribution types -
PARTITIONED, REPLICATED, SINGLETON. To make things worse, PARTITIONED may
or may not have concrete distribution fields. In theory, I can create one
transformation per distribution type, but that would increase plan space
significantly. In my sample "Root <- Project <- Scan" plan, there is no
room for optimization at all, we only need to convert the nodes and
propagate traits. But if I follow the proposed approach, the planner would
create three equivalent paths. For complex queries, this may increase
optimization time significantly.

What I need instead is to gradually convert and optimize nodes *bottom-up*,
instead of top-bottom. That is, create a physical scan first, then create a
physical project on top of it, etc. But at the same time, some rules still
require the top-bottom approach. So essentially I need the optimizer to do
both. Abstract converters help me establish bottom-up preparation but do
this at the cost of considering too many trait pairs, and as a result, also
pollute the search space.

To the contrast, precise command "I transformed the node A to node B,
please re-trigger the rules for A's parents" would allow us to re-trigger
only required rules, without adding more nodes.

Does it make sense?

Regards,
Vladimir.

ср, 30 окт. 2019 г. в 21:02, Stamatis Zampetakis :

> I admit that I didn't thoroughly read the exchanges so far but forcing the
> rules trigger does not seem the right approach.
>
> Other than that I would like to backtrack a bit to point 4.3 raised by
> Vladimir.
>
> "ProjectPhysicalRule [6] - transforms logical project to physical
> project *ONLY* if there is an underlying physical input with REPLICATED or
> SINGLETON distribution"
>
> The rule could be modified to do the following two transformations:
> 1. Create a physical project and require the input to be REPLICATED.
> 2. Create a physical project and require
> the input to be SINGLETON.
>
> I would assume that afterwards when your scan rule fires it should go to
> the appropriate subset and give you back the desired plan. Is there a
> problem with this approach?
>
> Best,
> Stamatis
>
> On Wed, Oct 30, 2019, 5:52 PM Seliverstov Igor 
> wrote:
>
> > Unfortunately it requires package-private API usage.
> >
> > Best solution there if Calcite provide it for end users.
> >
> > ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov :
> >
> > > “e pension” == “expand conversion” :)
> > >
> > > ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov :
> > >
> > > > Yes, that may work. Even if e pension rule is used, for the most
> cases
> > it
> > > > will not trigger any real conversions, since we are moving from
> > abstract
> > > > convention to physical, and created converters will have the opposite
> > > trait
> > > > direction (from physical to abstract).
> > > >
> > > > But again - ideally we only need to re-trigger the rules for a
> specific
> > > > node, no more than that. So API support like
> > > > “VolcanoPlanner.forceRules(RelNode)” would be very convenient.
> > > >
> > > > What do you think?
> > > >
> > > > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor  >:
> > > >
> > > >> I considered manual rules calling too, for now we use abstract
> > > converters
> > > >> +
> > > >> ExpandConversionRule for exchanges producing.
> > > >>
> > > >> You may create such converters manually (checking appropriate
> subset)
> > > this
> > > >> case you may reduce created converters count, also, a converter is a
> > > quite
> > > >> special node, that does almost nothing (without corresponding rule)
> it
> > > may
> > > >> be used just as a rule trigger.
> > > >>
> > > >> Regards,
> > > >> Igor
> > > >>
> > > >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :
> > > >>
> > > >> > One funny hack which helped me is manual registration of a fake
> > > RelNode
> > > >> > with desired traits through VolcanoPlanner.register() method. But
> > > again,
> > > >> > this leads to trashing. What could really help is a call to
> > > >> > VolcanoPlanner.fireRules() with desired rel. But this doesn't work
> > out
> > > >> of
> > > >> > the box since some internals of the rule queue needs to be
> adjusted.
> > > >> >
> > > >> > What does the community think about adding a method which will
> > re-add
> > > >> rules
> > > >> > applicable to the specific RelNode to the rule queue?
> > > >> >
> > > >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov  >:
> > > >> >
> > > >> > > Hi Igor,
> > > >> > >
> > > >> > > Yes, I came to the same conclusion, thank you. This is how it
> > > >> basically
> > > >> > > happens when converters are disabled:
> > > >> > > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> > > >> > > 2) Then we convert LogicalScan to PhysicalScan, so it is added
> to
> > > the
> > > >> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> > > >> > > 3) Finally, when it is time to fire 

Re: Problem with converters and possibly rule matching

2019-10-30 Thread Stamatis Zampetakis
I admit that I didn't thoroughly read the exchanges so far but forcing the
rules trigger does not seem the right approach.

Other than that I would like to backtrack a bit to point 4.3 raised by
Vladimir.

"ProjectPhysicalRule [6] - transforms logical project to physical
project *ONLY* if there is an underlying physical input with REPLICATED or
SINGLETON distribution"

The rule could be modified to do the following two transformations:
1. Create a physical project and require the input to be REPLICATED.
2. Create a physical project and require
the input to be SINGLETON.

I would assume that afterwards when your scan rule fires it should go to
the appropriate subset and give you back the desired plan. Is there a
problem with this approach?

Best,
Stamatis

On Wed, Oct 30, 2019, 5:52 PM Seliverstov Igor  wrote:

> Unfortunately it requires package-private API usage.
>
> Best solution there if Calcite provide it for end users.
>
> ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov :
>
> > “e pension” == “expand conversion” :)
> >
> > ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov :
> >
> > > Yes, that may work. Even if e pension rule is used, for the most cases
> it
> > > will not trigger any real conversions, since we are moving from
> abstract
> > > convention to physical, and created converters will have the opposite
> > trait
> > > direction (from physical to abstract).
> > >
> > > But again - ideally we only need to re-trigger the rules for a specific
> > > node, no more than that. So API support like
> > > “VolcanoPlanner.forceRules(RelNode)” would be very convenient.
> > >
> > > What do you think?
> > >
> > > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor :
> > >
> > >> I considered manual rules calling too, for now we use abstract
> > converters
> > >> +
> > >> ExpandConversionRule for exchanges producing.
> > >>
> > >> You may create such converters manually (checking appropriate subset)
> > this
> > >> case you may reduce created converters count, also, a converter is a
> > quite
> > >> special node, that does almost nothing (without corresponding rule) it
> > may
> > >> be used just as a rule trigger.
> > >>
> > >> Regards,
> > >> Igor
> > >>
> > >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :
> > >>
> > >> > One funny hack which helped me is manual registration of a fake
> > RelNode
> > >> > with desired traits through VolcanoPlanner.register() method. But
> > again,
> > >> > this leads to trashing. What could really help is a call to
> > >> > VolcanoPlanner.fireRules() with desired rel. But this doesn't work
> out
> > >> of
> > >> > the box since some internals of the rule queue needs to be adjusted.
> > >> >
> > >> > What does the community think about adding a method which will
> re-add
> > >> rules
> > >> > applicable to the specific RelNode to the rule queue?
> > >> >
> > >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :
> > >> >
> > >> > > Hi Igor,
> > >> > >
> > >> > > Yes, I came to the same conclusion, thank you. This is how it
> > >> basically
> > >> > > happens when converters are disabled:
> > >> > > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> > >> > > 2) Then we convert LogicalScan to PhysicalScan, so it is added to
> > the
> > >> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> > >> > > 3) Finally, when it is time to fire a rule for PhysicalScan, we
> try
> > to
> > >> > get
> > >> > > parents of that scan set with traits of the PhysicalScan. Since
> > there
> > >> are
> > >> > > no such parents (we skipped it intentionally), the rule is not
> > queued.
> > >> > >
> > >> > > But when converters are enabled, a converter rel is created:
> > >> > [LogicalProject]
> > >> > > <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No
> > >> rules
> > >> > > are fired for PhysicalScan again, but they are fired for converter
> > >> since
> > >> > > it has the necessary LOGICAL trait.
> > >> > >
> > >> > > It makes sense, that converters essentially allow forcing rule
> > >> invocation
> > >> > > on parents, even if the child was created with different traits.
> But
> > >> it
> > >> > > seems that this mechanism may be too heavy for complex queries
> > >> because it
> > >> > > literally creates hundreds of new converter rels and triggers
> rules
> > >> over
> > >> > > and over again.
> > >> > >
> > >> > > We need some fine-grained alternative. Basically, what would be
> > enough
> > >> > for
> > >> > > me is to let the planner know somehow: "I created that rel, and I
> > want
> > >> > you
> > >> > > to execute parent rules not only using its trait but also on this
> > and
> > >> > those
> > >> > > traits."
> > >> > > Is there any API in Calcite which allows doing this without
> > creating a
> > >> > new
> > >> > > rel node?
> > >> > >
> > >> > > Regards,
> > >> > > Vladimir.
> > >> > >
> > >> > >
> > >> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor <
> gvvinbl...@gmail.com
> > >:
> > >> > >
> > >> > >> Vladimir,
> > >> > >>
> > >> > >> Probably it'll help 

Re: Problem with converters and possibly rule matching

2019-10-30 Thread Seliverstov Igor
Unfortunately it requires package-private API usage.

Best solution there if Calcite provide it for end users.

ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov :

> “e pension” == “expand conversion” :)
>
> ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov :
>
> > Yes, that may work. Even if e pension rule is used, for the most cases it
> > will not trigger any real conversions, since we are moving from abstract
> > convention to physical, and created converters will have the opposite
> trait
> > direction (from physical to abstract).
> >
> > But again - ideally we only need to re-trigger the rules for a specific
> > node, no more than that. So API support like
> > “VolcanoPlanner.forceRules(RelNode)” would be very convenient.
> >
> > What do you think?
> >
> > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor :
> >
> >> I considered manual rules calling too, for now we use abstract
> converters
> >> +
> >> ExpandConversionRule for exchanges producing.
> >>
> >> You may create such converters manually (checking appropriate subset)
> this
> >> case you may reduce created converters count, also, a converter is a
> quite
> >> special node, that does almost nothing (without corresponding rule) it
> may
> >> be used just as a rule trigger.
> >>
> >> Regards,
> >> Igor
> >>
> >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :
> >>
> >> > One funny hack which helped me is manual registration of a fake
> RelNode
> >> > with desired traits through VolcanoPlanner.register() method. But
> again,
> >> > this leads to trashing. What could really help is a call to
> >> > VolcanoPlanner.fireRules() with desired rel. But this doesn't work out
> >> of
> >> > the box since some internals of the rule queue needs to be adjusted.
> >> >
> >> > What does the community think about adding a method which will re-add
> >> rules
> >> > applicable to the specific RelNode to the rule queue?
> >> >
> >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :
> >> >
> >> > > Hi Igor,
> >> > >
> >> > > Yes, I came to the same conclusion, thank you. This is how it
> >> basically
> >> > > happens when converters are disabled:
> >> > > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> >> > > 2) Then we convert LogicalScan to PhysicalScan, so it is added to
> the
> >> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> >> > > 3) Finally, when it is time to fire a rule for PhysicalScan, we try
> to
> >> > get
> >> > > parents of that scan set with traits of the PhysicalScan. Since
> there
> >> are
> >> > > no such parents (we skipped it intentionally), the rule is not
> queued.
> >> > >
> >> > > But when converters are enabled, a converter rel is created:
> >> > [LogicalProject]
> >> > > <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No
> >> rules
> >> > > are fired for PhysicalScan again, but they are fired for converter
> >> since
> >> > > it has the necessary LOGICAL trait.
> >> > >
> >> > > It makes sense, that converters essentially allow forcing rule
> >> invocation
> >> > > on parents, even if the child was created with different traits. But
> >> it
> >> > > seems that this mechanism may be too heavy for complex queries
> >> because it
> >> > > literally creates hundreds of new converter rels and triggers rules
> >> over
> >> > > and over again.
> >> > >
> >> > > We need some fine-grained alternative. Basically, what would be
> enough
> >> > for
> >> > > me is to let the planner know somehow: "I created that rel, and I
> want
> >> > you
> >> > > to execute parent rules not only using its trait but also on this
> and
> >> > those
> >> > > traits."
> >> > > Is there any API in Calcite which allows doing this without
> creating a
> >> > new
> >> > > rel node?
> >> > >
> >> > > Regards,
> >> > > Vladimir.
> >> > >
> >> > >
> >> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor  >:
> >> > >
> >> > >> Vladimir,
> >> > >>
> >> > >> Probably it'll help you:
> >> > >>
> >> > >> Seems the cause of issue in RelSubset.getParentRels()  The check
> used
> >> > when
> >> > >> the planner schedules newly matched rules after successful
> >> > transformation
> >> > >> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be
> >> > applied
> >> > >> once again (here your logical project with an input having ANY
> >> > >> distribution
> >> > >> doesn't satisfy a transformed input traits).
> >> > >>
> >> > >> In our case we use another workaround, so there are also much more
> >> > >> transformations than we wanted, so the desired rule is triggered.
> >> > >>
> >> > >>
> >> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
> >> > >>
> >> > >> > Hi Vladimir,
> >> > >> >
> >> > >> > I am sorry. Pushed, it works now.
> >> > >> >
> >> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> >> > >> > sitnikov.vladi...@gmail.com
> >> > >> > >:
> >> > >> >
> >> > >> > > > mvn clean test
> >> > >> > >
> >> > >> > > [ERROR] The goal you specified requires a project to execute
> but
> >> > >> there is
> >> > >> > > no POM in this 

Re: Problem with converters and possibly rule matching

2019-10-30 Thread Vladimir Ozerov
Yes, that may work. Even if e pension rule is used, for the most cases it
will not trigger any real conversions, since we are moving from abstract
convention to physical, and created converters will have the opposite trait
direction (from physical to abstract).

But again - ideally we only need to re-trigger the rules for a specific
node, no more than that. So API support like
“VolcanoPlanner.forceRules(RelNode)” would be very convenient.

What do you think?

ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor :

> I considered manual rules calling too, for now we use abstract converters +
> ExpandConversionRule for exchanges producing.
>
> You may create such converters manually (checking appropriate subset) this
> case you may reduce created converters count, also, a converter is a quite
> special node, that does almost nothing (without corresponding rule) it may
> be used just as a rule trigger.
>
> Regards,
> Igor
>
> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :
>
> > One funny hack which helped me is manual registration of a fake RelNode
> > with desired traits through VolcanoPlanner.register() method. But again,
> > this leads to trashing. What could really help is a call to
> > VolcanoPlanner.fireRules() with desired rel. But this doesn't work out of
> > the box since some internals of the rule queue needs to be adjusted.
> >
> > What does the community think about adding a method which will re-add
> rules
> > applicable to the specific RelNode to the rule queue?
> >
> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :
> >
> > > Hi Igor,
> > >
> > > Yes, I came to the same conclusion, thank you. This is how it basically
> > > happens when converters are disabled:
> > > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> > > 2) Then we convert LogicalScan to PhysicalScan, so it is added to the
> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> > > 3) Finally, when it is time to fire a rule for PhysicalScan, we try to
> > get
> > > parents of that scan set with traits of the PhysicalScan. Since there
> are
> > > no such parents (we skipped it intentionally), the rule is not queued.
> > >
> > > But when converters are enabled, a converter rel is created:
> > [LogicalProject]
> > > <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No
> rules
> > > are fired for PhysicalScan again, but they are fired for converter
> since
> > > it has the necessary LOGICAL trait.
> > >
> > > It makes sense, that converters essentially allow forcing rule
> invocation
> > > on parents, even if the child was created with different traits. But it
> > > seems that this mechanism may be too heavy for complex queries because
> it
> > > literally creates hundreds of new converter rels and triggers rules
> over
> > > and over again.
> > >
> > > We need some fine-grained alternative. Basically, what would be enough
> > for
> > > me is to let the planner know somehow: "I created that rel, and I want
> > you
> > > to execute parent rules not only using its trait but also on this and
> > those
> > > traits."
> > > Is there any API in Calcite which allows doing this without creating a
> > new
> > > rel node?
> > >
> > > Regards,
> > > Vladimir.
> > >
> > >
> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor :
> > >
> > >> Vladimir,
> > >>
> > >> Probably it'll help you:
> > >>
> > >> Seems the cause of issue in RelSubset.getParentRels()  The check used
> > when
> > >> the planner schedules newly matched rules after successful
> > transformation
> > >> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be
> > applied
> > >> once again (here your logical project with an input having ANY
> > >> distribution
> > >> doesn't satisfy a transformed input traits).
> > >>
> > >> In our case we use another workaround, so there are also much more
> > >> transformations than we wanted, so the desired rule is triggered.
> > >>
> > >>
> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
> > >>
> > >> > Hi Vladimir,
> > >> >
> > >> > I am sorry. Pushed, it works now.
> > >> >
> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> > >> > sitnikov.vladi...@gmail.com
> > >> > >:
> > >> >
> > >> > > > mvn clean test
> > >> > >
> > >> > > [ERROR] The goal you specified requires a project to execute but
> > >> there is
> > >> > > no POM in this directory
> > >> > >
> > >> > > Vladimir, please push missing files
> > >> > >
> > >> > > Vladimir
> > >> > >
> > >> >
> > >>
> > >
> >
>


Re: Problem with converters and possibly rule matching

2019-10-30 Thread Vladimir Ozerov
“e pension” == “expand conversion” :)

ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov :

> Yes, that may work. Even if e pension rule is used, for the most cases it
> will not trigger any real conversions, since we are moving from abstract
> convention to physical, and created converters will have the opposite trait
> direction (from physical to abstract).
>
> But again - ideally we only need to re-trigger the rules for a specific
> node, no more than that. So API support like
> “VolcanoPlanner.forceRules(RelNode)” would be very convenient.
>
> What do you think?
>
> ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor :
>
>> I considered manual rules calling too, for now we use abstract converters
>> +
>> ExpandConversionRule for exchanges producing.
>>
>> You may create such converters manually (checking appropriate subset) this
>> case you may reduce created converters count, also, a converter is a quite
>> special node, that does almost nothing (without corresponding rule) it may
>> be used just as a rule trigger.
>>
>> Regards,
>> Igor
>>
>> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :
>>
>> > One funny hack which helped me is manual registration of a fake RelNode
>> > with desired traits through VolcanoPlanner.register() method. But again,
>> > this leads to trashing. What could really help is a call to
>> > VolcanoPlanner.fireRules() with desired rel. But this doesn't work out
>> of
>> > the box since some internals of the rule queue needs to be adjusted.
>> >
>> > What does the community think about adding a method which will re-add
>> rules
>> > applicable to the specific RelNode to the rule queue?
>> >
>> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :
>> >
>> > > Hi Igor,
>> > >
>> > > Yes, I came to the same conclusion, thank you. This is how it
>> basically
>> > > happens when converters are disabled:
>> > > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
>> > > 2) Then we convert LogicalScan to PhysicalScan, so it is added to the
>> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
>> > > 3) Finally, when it is time to fire a rule for PhysicalScan, we try to
>> > get
>> > > parents of that scan set with traits of the PhysicalScan. Since there
>> are
>> > > no such parents (we skipped it intentionally), the rule is not queued.
>> > >
>> > > But when converters are enabled, a converter rel is created:
>> > [LogicalProject]
>> > > <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No
>> rules
>> > > are fired for PhysicalScan again, but they are fired for converter
>> since
>> > > it has the necessary LOGICAL trait.
>> > >
>> > > It makes sense, that converters essentially allow forcing rule
>> invocation
>> > > on parents, even if the child was created with different traits. But
>> it
>> > > seems that this mechanism may be too heavy for complex queries
>> because it
>> > > literally creates hundreds of new converter rels and triggers rules
>> over
>> > > and over again.
>> > >
>> > > We need some fine-grained alternative. Basically, what would be enough
>> > for
>> > > me is to let the planner know somehow: "I created that rel, and I want
>> > you
>> > > to execute parent rules not only using its trait but also on this and
>> > those
>> > > traits."
>> > > Is there any API in Calcite which allows doing this without creating a
>> > new
>> > > rel node?
>> > >
>> > > Regards,
>> > > Vladimir.
>> > >
>> > >
>> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor :
>> > >
>> > >> Vladimir,
>> > >>
>> > >> Probably it'll help you:
>> > >>
>> > >> Seems the cause of issue in RelSubset.getParentRels()  The check used
>> > when
>> > >> the planner schedules newly matched rules after successful
>> > transformation
>> > >> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be
>> > applied
>> > >> once again (here your logical project with an input having ANY
>> > >> distribution
>> > >> doesn't satisfy a transformed input traits).
>> > >>
>> > >> In our case we use another workaround, so there are also much more
>> > >> transformations than we wanted, so the desired rule is triggered.
>> > >>
>> > >>
>> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
>> > >>
>> > >> > Hi Vladimir,
>> > >> >
>> > >> > I am sorry. Pushed, it works now.
>> > >> >
>> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
>> > >> > sitnikov.vladi...@gmail.com
>> > >> > >:
>> > >> >
>> > >> > > > mvn clean test
>> > >> > >
>> > >> > > [ERROR] The goal you specified requires a project to execute but
>> > >> there is
>> > >> > > no POM in this directory
>> > >> > >
>> > >> > > Vladimir, please push missing files
>> > >> > >
>> > >> > > Vladimir
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>


Re: Problem with converters and possibly rule matching

2019-10-30 Thread Seliverstov Igor
I considered manual rules calling too, for now we use abstract converters +
ExpandConversionRule for exchanges producing.

You may create such converters manually (checking appropriate subset) this
case you may reduce created converters count, also, a converter is a quite
special node, that does almost nothing (without corresponding rule) it may
be used just as a rule trigger.

Regards,
Igor

ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov :

> One funny hack which helped me is manual registration of a fake RelNode
> with desired traits through VolcanoPlanner.register() method. But again,
> this leads to trashing. What could really help is a call to
> VolcanoPlanner.fireRules() with desired rel. But this doesn't work out of
> the box since some internals of the rule queue needs to be adjusted.
>
> What does the community think about adding a method which will re-add rules
> applicable to the specific RelNode to the rule queue?
>
> ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :
>
> > Hi Igor,
> >
> > Yes, I came to the same conclusion, thank you. This is how it basically
> > happens when converters are disabled:
> > 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> > 2) Then we convert LogicalScan to PhysicalScan, so it is added to the
> > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> > 3) Finally, when it is time to fire a rule for PhysicalScan, we try to
> get
> > parents of that scan set with traits of the PhysicalScan. Since there are
> > no such parents (we skipped it intentionally), the rule is not queued.
> >
> > But when converters are enabled, a converter rel is created:
> [LogicalProject]
> > <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No rules
> > are fired for PhysicalScan again, but they are fired for converter since
> > it has the necessary LOGICAL trait.
> >
> > It makes sense, that converters essentially allow forcing rule invocation
> > on parents, even if the child was created with different traits. But it
> > seems that this mechanism may be too heavy for complex queries because it
> > literally creates hundreds of new converter rels and triggers rules over
> > and over again.
> >
> > We need some fine-grained alternative. Basically, what would be enough
> for
> > me is to let the planner know somehow: "I created that rel, and I want
> you
> > to execute parent rules not only using its trait but also on this and
> those
> > traits."
> > Is there any API in Calcite which allows doing this without creating a
> new
> > rel node?
> >
> > Regards,
> > Vladimir.
> >
> >
> > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor :
> >
> >> Vladimir,
> >>
> >> Probably it'll help you:
> >>
> >> Seems the cause of issue in RelSubset.getParentRels()  The check used
> when
> >> the planner schedules newly matched rules after successful
> transformation
> >> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be
> applied
> >> once again (here your logical project with an input having ANY
> >> distribution
> >> doesn't satisfy a transformed input traits).
> >>
> >> In our case we use another workaround, so there are also much more
> >> transformations than we wanted, so the desired rule is triggered.
> >>
> >>
> >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
> >>
> >> > Hi Vladimir,
> >> >
> >> > I am sorry. Pushed, it works now.
> >> >
> >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> >> > sitnikov.vladi...@gmail.com
> >> > >:
> >> >
> >> > > > mvn clean test
> >> > >
> >> > > [ERROR] The goal you specified requires a project to execute but
> >> there is
> >> > > no POM in this directory
> >> > >
> >> > > Vladimir, please push missing files
> >> > >
> >> > > Vladimir
> >> > >
> >> >
> >>
> >
>


Re: Problem with converters and possibly rule matching

2019-10-30 Thread Vladimir Ozerov
One funny hack which helped me is manual registration of a fake RelNode
with desired traits through VolcanoPlanner.register() method. But again,
this leads to trashing. What could really help is a call to
VolcanoPlanner.fireRules() with desired rel. But this doesn't work out of
the box since some internals of the rule queue needs to be adjusted.

What does the community think about adding a method which will re-add rules
applicable to the specific RelNode to the rule queue?

ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov :

> Hi Igor,
>
> Yes, I came to the same conclusion, thank you. This is how it basically
> happens when converters are disabled:
> 1) We start with initial tree: [LogicalProject] <- [LogicalScan]
> 2) Then we convert LogicalScan to PhysicalScan, so it is added to the
> set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> 3) Finally, when it is time to fire a rule for PhysicalScan, we try to get
> parents of that scan set with traits of the PhysicalScan. Since there are
> no such parents (we skipped it intentionally), the rule is not queued.
>
> But when converters are enabled, a converter rel is created: [LogicalProject]
> <- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No rules
> are fired for PhysicalScan again, but they are fired for converter since
> it has the necessary LOGICAL trait.
>
> It makes sense, that converters essentially allow forcing rule invocation
> on parents, even if the child was created with different traits. But it
> seems that this mechanism may be too heavy for complex queries because it
> literally creates hundreds of new converter rels and triggers rules over
> and over again.
>
> We need some fine-grained alternative. Basically, what would be enough for
> me is to let the planner know somehow: "I created that rel, and I want you
> to execute parent rules not only using its trait but also on this and those
> traits."
> Is there any API in Calcite which allows doing this without creating a new
> rel node?
>
> Regards,
> Vladimir.
>
>
> ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor :
>
>> Vladimir,
>>
>> Probably it'll help you:
>>
>> Seems the cause of issue in RelSubset.getParentRels()  The check used when
>> the planner schedules newly matched rules after successful transformation
>> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be applied
>> once again (here your logical project with an input having ANY
>> distribution
>> doesn't satisfy a transformed input traits).
>>
>> In our case we use another workaround, so there are also much more
>> transformations than we wanted, so the desired rule is triggered.
>>
>>
>> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
>>
>> > Hi Vladimir,
>> >
>> > I am sorry. Pushed, it works now.
>> >
>> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
>> > sitnikov.vladi...@gmail.com
>> > >:
>> >
>> > > > mvn clean test
>> > >
>> > > [ERROR] The goal you specified requires a project to execute but
>> there is
>> > > no POM in this directory
>> > >
>> > > Vladimir, please push missing files
>> > >
>> > > Vladimir
>> > >
>> >
>>
>


Re: Problem with converters and possibly rule matching

2019-10-30 Thread Vladimir Ozerov
Hi Igor,

Yes, I came to the same conclusion, thank you. This is how it basically
happens when converters are disabled:
1) We start with initial tree: [LogicalProject] <- [LogicalScan]
2) Then we convert LogicalScan to PhysicalScan, so it is added to the
set: [LogicalProject]
<- [LogicalScan, PhysicalScan]
3) Finally, when it is time to fire a rule for PhysicalScan, we try to get
parents of that scan set with traits of the PhysicalScan. Since there are
no such parents (we skipped it intentionally), the rule is not queued.

But when converters are enabled, a converter rel is created: [LogicalProject]
<- [LogicalScan, PhysicalScan, ConverterFromPhysicalToLogical]. No rules
are fired for PhysicalScan again, but they are fired for converter since it
has the necessary LOGICAL trait.

It makes sense, that converters essentially allow forcing rule invocation
on parents, even if the child was created with different traits. But it
seems that this mechanism may be too heavy for complex queries because it
literally creates hundreds of new converter rels and triggers rules over
and over again.

We need some fine-grained alternative. Basically, what would be enough for
me is to let the planner know somehow: "I created that rel, and I want you
to execute parent rules not only using its trait but also on this and those
traits."
Is there any API in Calcite which allows doing this without creating a new
rel node?

Regards,
Vladimir.


ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor :

> Vladimir,
>
> Probably it'll help you:
>
> Seems the cause of issue in RelSubset.getParentRels()  The check used when
> the planner schedules newly matched rules after successful transformation
> (see VolcanoRuleCall.matchRecurse), it prevents the parent rule be applied
> once again (here your logical project with an input having ANY distribution
> doesn't satisfy a transformed input traits).
>
> In our case we use another workaround, so there are also much more
> transformations than we wanted, so the desired rule is triggered.
>
>
> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :
>
> > Hi Vladimir,
> >
> > I am sorry. Pushed, it works now.
> >
> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> > sitnikov.vladi...@gmail.com
> > >:
> >
> > > > mvn clean test
> > >
> > > [ERROR] The goal you specified requires a project to execute but there
> is
> > > no POM in this directory
> > >
> > > Vladimir, please push missing files
> > >
> > > Vladimir
> > >
> >
>


Re: Problem with converters and possibly rule matching

2019-10-30 Thread Seliverstov Igor
Vladimir,

Probably it'll help you:

Seems the cause of issue in RelSubset.getParentRels()  The check used when
the planner schedules newly matched rules after successful transformation
(see VolcanoRuleCall.matchRecurse), it prevents the parent rule be applied
once again (here your logical project with an input having ANY distribution
doesn't satisfy a transformed input traits).

In our case we use another workaround, so there are also much more
transformations than we wanted, so the desired rule is triggered.


вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov :

> Hi Vladimir,
>
> I am sorry. Pushed, it works now.
>
> вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> sitnikov.vladi...@gmail.com
> >:
>
> > > mvn clean test
> >
> > [ERROR] The goal you specified requires a project to execute but there is
> > no POM in this directory
> >
> > Vladimir, please push missing files
> >
> > Vladimir
> >
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi Vladimir,

I am sorry. Pushed, it works now.

вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov :

> > mvn clean test
>
> [ERROR] The goal you specified requires a project to execute but there is
> no POM in this directory
>
> Vladimir, please push missing files
>
> Vladimir
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Sitnikov
> mvn clean test

[ERROR] The goal you specified requires a project to execute but there is
no POM in this directory

Vladimir, please push missing files

Vladimir


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi colleagues,

I prepared (relatively) simple reproducer for the problem [1]. It can be
executed with three commands:
git clone https://github.com/devozerov/calcite-optimizer.git
cd calcite-optimizer
mvn clean test

Please do not pay attention to stuff located inside "org.apache.calcite"
package, as it is most likely of little interest for the given problem. I
removed a lot of unrelated stuff for the sake of simplicity, so some things
may seem strange or unnecessary.

So what is going on here?
1) We have a distributed engine that should assemble data from remote
nodes. Let's say, we have three distribution types [2]:
ANY - abstract unknown distribution, which is yet to be resolved during
physical planning. This is the default
REPLICATED - this is how the tables are organized - they are copied to all
nodes
SINGLETON - in this example, this is the distribution of a final aggregator
node which is enforced during physical planning

REPLICATED satisfies SINGLETON. SINGLETON doesn't satisfy REPLICATED.

2) There is a query "SELECT FUNC(f) FROM t", which is converted to the
following form with LOGICAL convention during logical planning:
RootLogicalRel[convention=LOGICAL, distribution=ANY]
-> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
  -> MapScanLogicalRel[convention=LOGICAL, distribution=ANY]

3) Then, during physical planning we need to resolve and propagate
distributions, and enforce [convention=PHYSICAL, distribution=SINGLETON],
possibly injecting exchange operators in between. In this specific case, no
exchanges are needed, because scans are always REPLICATED, projection
inherits scan distribution, and for the root node is ok to use REPLICATED
input instead of SINGLETON, since the first one satisfies the latter. The
expected physical tree is:
RootPhysicalRel[convention=PHYSICAL, distribution=SINGLETON]
-> ProjectPhysicalRel[convention=PHYSICAL, distribution=REPLICATED]
  -> MapScanPhysicalRel[convention=PHYSICAL, distribution=REPLICATED]

4) There are three physical rules for each operator, which are organized as
follows:
4.1) RootPhysicalRule [4] - converts RootLogicalRel node, enforcing
SINGLETON distribution on its input. As a result, an appropriate exchange
is injected later from DistributionTraitDef [3]. As I said, in this
specific query exchange is not needed
4.2) MapScanPhysicalRule [5] - converts MapScanLogicalRel. At this point,
we resolve input distribution.
4.3) ProjectPhysicalRule [6] - transforms logical project to physical
project *ONLY* if there is an underlying physical input with REPLICATED or
SINGLETON distribution.

5) Original case fails because the rules are executed as follows:
5.1) RootPhysicalRule - do transform
5.2) ProjectPhysicalRule - do nothing at this point
5.3) MapScanPhysicalRule - do transform
I would like ProjectPhysicalRule to be re-fired again, but this doesn't
happen.

There are two workarounds for this, and both concern me.

*Workaround 1*: set "canConvertConvention" to "true". In this case, rules
are executed in the way I would expect. But I observe too many rule
invocations and unnecessary trait conversions because enabling this flag
leads to matching every rel in a set with every other. For example, Volcano
tries to convert SINGLETON -> REPLICATED, while it is never actually needed
by my engine. And this is how rules are executed:
1) RootPhysicalRule - expected
2) RootPhysicalRule - fired again, because I created a converted node on
the previous step
3) ProjectPhysicalRule - expected
4) ProjectPhysicalRule - fired again for the same reason as p.2
5) MapScanPhysicalRule - expected
6) ProjectPhysicalRule - parent rule is called as I would expect - this is
where the missing physical project appear
7) RootPhysicalRule - called because its child was transformed
+ ExpandConversionRule is called multiple times, and sometimes it creates
nodes, which are otherwise completely unnecessary. Such as the
aforementioned SINGLETON -> REPLICATED conversion.
So in the end, it works as I would expect even for much more complex plans.
But produces a litter which I cannot control

Resulting plan:
RootPhysicalRel.PHYSICAL.SINGLETON.[](input=ProjectPhysicalRel#91)
  ProjectPhysicalRel.PHYSICAL.REPLICATED.[](input=MapScanPhysicalRel#81,
...))
MapScanPhysicalRel.PHYSICAL.REPLICATED.[](table=[t],projects=[0])

*Workaround 2*: add exchanges for ANY distribution and produce
transformation with ANY distribution in the project rule. This is what
Drill does. But it also produces a litter. This time these are unnecessary
exchanges; we pessimistically create them when input is ANY, and if it
eventually resolved to any compatible distribution, the exchange stays
there. As far as I understand, Drill's ExcessiveExchangeIdentifier removes
that stuff after planning separately.

Resulting plan:
RootPhysicalRel.PHYSICAL.SINGLETON.[](input=UnicastExchangePhysicalRel#30)

UnicastExchangePhysicalRel.PHYSICAL.SINGLETON.[](input=ProjectPhysicalRel#29,hashFields=[0])
// Not needed!

Re: Problem with converters and possibly rule matching

2019-10-29 Thread Vladimir Ozerov
Hi everybody,

First of all, thank you for answers and suggestions. Let me address them
briefly:
1) I use two conventions at the moment, LOGICAL and PHYSICAL. I agree with
you that this might be overkill, and there is a chance that in the final
solution we may end up with only one. But meanwhile having this separation
seems handy, because on the first stage I enforce the optimizer to
propagate NONE -> LOGICAL conversions. Then I have a clean logical tree
*before* any physical distribution stuff is involved, which I can use for
internal post-processing before going logical. I would propose to keep it
out of scope at the moment. Let's just consider that the goal is to convert
from one convention to another.
2) Operands with "any" matchers are already used
3) The reason why I would like to avoid the "Project" rule fire on the
first run, is that it doesn't enforce any distribution on its child
("Scan"). Instead, it needs to derive the distribution from the scan.

To make the problem more clear, let me prepare a simple reproducer for the
issue.

Regards,
Vladimir.

вт, 29 окт. 2019 г. в 10:01, Seliverstov Igor :

> Vladimir,
>
> I guess Project rule doesn't have a child matcher. Put into it child "any"
> match rule and it will apply on each child node transformation.
>
> Regards,
> Igor
>
>
> вт, 29 окт. 2019 г., 7:07 Danny Chan :
>
> > Vladimir, all you need to do is to change the convention of the root
> node,
> > the volcano would propagate the convention to all its input nodes when
> > registering them to the planner. You can take this code [1] for
> reference :)
> >
> > [1]
> >
> https://github.com/apache/calcite/blob/1ef2821695ca6e10fbad7b8efe7246c4a20143af/core/src/main/java/org/apache/calcite/tools/Programs.java#L324
> >
> > Best,
> > Danny Chan
> > 在 2019年10月29日 +0800 AM5:24,dev@calcite.apache.org,写道:
> > >
> > > n of the scan.
> >
>


Re: Problem with converters and possibly rule matching

2019-10-29 Thread Seliverstov Igor
Vladimir,

I guess Project rule doesn't have a child matcher. Put into it child "any"
match rule and it will apply on each child node transformation.

Regards,
Igor


вт, 29 окт. 2019 г., 7:07 Danny Chan :

> Vladimir, all you need to do is to change the convention of the root node,
> the volcano would propagate the convention to all its input nodes when
> registering them to the planner. You can take this code [1] for reference :)
>
> [1]
> https://github.com/apache/calcite/blob/1ef2821695ca6e10fbad7b8efe7246c4a20143af/core/src/main/java/org/apache/calcite/tools/Programs.java#L324
>
> Best,
> Danny Chan
> 在 2019年10月29日 +0800 AM5:24,dev@calcite.apache.org,写道:
> >
> > n of the scan.
>


Re: Problem with converters and possibly rule matching

2019-10-28 Thread Danny Chan
Vladimir, all you need to do is to change the convention of the root node, the 
volcano would propagate the convention to all its input nodes when registering 
them to the planner. You can take this code [1] for reference :)

[1] 
https://github.com/apache/calcite/blob/1ef2821695ca6e10fbad7b8efe7246c4a20143af/core/src/main/java/org/apache/calcite/tools/Programs.java#L324

Best,
Danny Chan
在 2019年10月29日 +0800 AM5:24,dev@calcite.apache.org,写道:
>
> n of the scan.


Re: Re: Problem with converters and possibly rule matching

2019-10-28 Thread Haisheng Yuan
Why do you need 2 conventions? I think 1 "Hazelcast" is enough. Usually logical 
operators are ones with NONE convention.

Regarding with physical->logical, you can modify the rule instance to match 
logical operators or NONE convention only.

- Haisheng

--
发件人:Stamatis Zampetakis
日 期:2019年10月29日 06:10:45
收件人:
主 题:Re: Problem with converters and possibly rule matching

Hi Vladimir,

You can get an idea of how the Volcano planner works by reading [1].
The implementation of the Volcano in Calcite has many differences but the
main ideas are there.

Normally you do not need to set canConvertConvention to true; especially in
your case it doesn't seem necessary.

I think the main problem with your approach is that the Project rule does
not produce any transformation when it is first invoked.
The Volcano planner works mainly in a top-down fashion so rules tend to
match from top to bottom.
For example, in [1] if for a particular operator there is no algorithm,
transformation rule, or enforcer to apply at a given step the planning
process would stop and you wouldn't even see the transformation of the scan.

When the project rule matches you should transform it to the
HZPhysicalProject and then require that the child operator has certain
traits (the distribution that is necessary).
The rule should look like the EnumerableProjectRule [2] but instead of
requiring only the convention you should pass also the requirement for the
distribution of the scan.

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableProjectRule.java



On Mon, Oct 28, 2019 at 10:24 PM Vladimir Ozerov  wrote:

> Hi colleagues,
>
> We are building a Calcite-based optimizer for Hazelcast, and I have some
> problems understanding Calcite's logic with respect to converters. Let me
> briefly explain the problem.
>
> We have an execution backend, so we do not need Bindable or Enumerable.
> Instead, we would like to use Calcite to convert original SQL to a tree
> with our own convention, then convert it to our internal representation,
> and finally, execute.
>
> We started with looking at other Calcite integrations and eventually came
> to a classical two-phase optimization approach. We have two internal
> conventions - LOGICAL and PHYSICAL. The goal is to optimize the tree as
> follows:
> 1) NONE -> LOGICAL - heuristical optimizations
> 2) LOGICAL -> PHYSICAL - cost-based planning
>
> Suppose that after the first phase I have the following tree of our own
> operators:
> HZLogicalRoot
> -> HZLogicalProject
>   -> HZLogicalScan
>
> For this specific case, there is not much to optimize, so we only need to
> transition to physical nodes and do some boilerplate with traits
> propagation:
> HZPhysicalRoot
> -> HZPhysicalProject
>   -> HZPhysicalScan
>
> In order to achieve this, I define three rules, which just do a conversion
> of relevant nodes. Volcano optimizer is used.
>
> Now, the problem - somehow it works only when I override
> Convention.Impl.canConvertConvention to true for our PHYSICAL convention,
> but that blows the search space and the same rules are called many times. A
> lot of time is spent on endless PHYSICAL -> LOGICAL conversions, which are
> of no use.
>
> If I change canConvertConvention to false, then rules are called a sensible
> number of times, but cannot produce a complete PHYSICAL tree. Here is how
> it works:
> 1) "Root" rule is invoked, which converts "HZLogicalRoot" to
> "HZPhysicalRoot"
> 2) "Project" rule is invoked, but do not produce any transformations, since
> it needs Scan distribution, which is not known yet. This desired behavior
> at this point.
> 3) "Scan" rule is invoked, "HZLogicalScan" is converted to
> "HZPhysicalScan". Distribution is resolved
> 4) At this point, we have [LogicalRoot, PhysicalRoot] -> [LogicalProject]
> -> [LogicalScan, PhysicalScan] sets . I expect that since new scan was
> installed, the "Project" rule will be fired again. This time we know the
> distribution, so the transformation is possible. But the rule is not called
> and we fail with an error.
>
> So my questions are:
> 1) What is the real role of converters in this process? For some reason,
> when unnecessary (from a logical standpoint) PHYSICAL -> LOGICAL conversion
> is allowed, even complex plans could be built. And Drill does it for some
> reason. But it costs multiple additional invocations of the same rules. Are
> there any docs or presentations explaining the mechanics behind?
> 2) What are the min

Re: Problem with converters and possibly rule matching

2019-10-28 Thread Stamatis Zampetakis
Hi Vladimir,

You can get an idea of how the Volcano planner works by reading [1].
The implementation of the Volcano in Calcite has many differences but the
main ideas are there.

Normally you do not need to set canConvertConvention to true; especially in
your case it doesn't seem necessary.

I think the main problem with your approach is that the Project rule does
not produce any transformation when it is first invoked.
The Volcano planner works mainly in a top-down fashion so rules tend to
match from top to bottom.
For example, in [1] if for a particular operator there is no algorithm,
transformation rule, or enforcer to apply at a given step the planning
process would stop and you wouldn't even see the transformation of the scan.

When the project rule matches you should transform it to the
HZPhysicalProject and then require that the child operator has certain
traits (the distribution that is necessary).
The rule should look like the EnumerableProjectRule [2] but instead of
requiring only the convention you should pass also the requirement for the
distribution of the scan.

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableProjectRule.java



On Mon, Oct 28, 2019 at 10:24 PM Vladimir Ozerov  wrote:

> Hi colleagues,
>
> We are building a Calcite-based optimizer for Hazelcast, and I have some
> problems understanding Calcite's logic with respect to converters. Let me
> briefly explain the problem.
>
> We have an execution backend, so we do not need Bindable or Enumerable.
> Instead, we would like to use Calcite to convert original SQL to a tree
> with our own convention, then convert it to our internal representation,
> and finally, execute.
>
> We started with looking at other Calcite integrations and eventually came
> to a classical two-phase optimization approach. We have two internal
> conventions - LOGICAL and PHYSICAL. The goal is to optimize the tree as
> follows:
> 1) NONE -> LOGICAL - heuristical optimizations
> 2) LOGICAL -> PHYSICAL - cost-based planning
>
> Suppose that after the first phase I have the following tree of our own
> operators:
> HZLogicalRoot
> -> HZLogicalProject
>   -> HZLogicalScan
>
> For this specific case, there is not much to optimize, so we only need to
> transition to physical nodes and do some boilerplate with traits
> propagation:
> HZPhysicalRoot
> -> HZPhysicalProject
>   -> HZPhysicalScan
>
> In order to achieve this, I define three rules, which just do a conversion
> of relevant nodes. Volcano optimizer is used.
>
> Now, the problem - somehow it works only when I override
> Convention.Impl.canConvertConvention to true for our PHYSICAL convention,
> but that blows the search space and the same rules are called many times. A
> lot of time is spent on endless PHYSICAL -> LOGICAL conversions, which are
> of no use.
>
> If I change canConvertConvention to false, then rules are called a sensible
> number of times, but cannot produce a complete PHYSICAL tree. Here is how
> it works:
> 1) "Root" rule is invoked, which converts "HZLogicalRoot" to
> "HZPhysicalRoot"
> 2) "Project" rule is invoked, but do not produce any transformations, since
> it needs Scan distribution, which is not known yet. This desired behavior
> at this point.
> 3) "Scan" rule is invoked, "HZLogicalScan" is converted to
> "HZPhysicalScan". Distribution is resolved
> 4) At this point, we have [LogicalRoot, PhysicalRoot] -> [LogicalProject]
> -> [LogicalScan, PhysicalScan] sets . I expect that since new scan was
> installed, the "Project" rule will be fired again. This time we know the
> distribution, so the transformation is possible. But the rule is not called
> and we fail with an error.
>
> So my questions are:
> 1) What is the real role of converters in this process? For some reason,
> when unnecessary (from a logical standpoint) PHYSICAL -> LOGICAL conversion
> is allowed, even complex plans could be built. And Drill does it for some
> reason. But it costs multiple additional invocations of the same rules. Are
> there any docs or presentations explaining the mechanics behind?
> 2) What are the minimum requirements, that will allow a rule on the parent
> to be fired again after it's child node has changed?
>
> I can provide any additional information, source code or even working
> example of this problem if needed. I don't want to bother you with it at
> the moment, because it feels like I miss something very simple.
>
> Would appreciate your help.
>
> Regards,
> Vladimir.
>