Re: Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Haisheng Yuan Sat, 14 Mar 2020 22:06:07 -0700

Thanks for your detailed explanation, Vladimir.

Indeed, the only way to propagate traits in Calcite currently is using rules, 
which is a big pain. I can feel your pain. I tried to come up ways to implement 
the trait derivation and requirement in the framwork without breaking current 
usages, only turns out it is almost impossible. It has too many stakeholders, 
even a small change may incur opposition.

But before we get the real top-down cascades framework, there are still a lot 
you can do to improve your planner's performance. 

Since Calcite 1.22.0, I committed a change that enabes RelSubset to be used to 
trigger a rule, which can greatly reduce the number of rule calls for trait 
propagation. With your example, you need 2 rules:
1. Physical implementation rule
  match pattern: operand(LogicalProject.class)
  Produce PhysicalProject without trait
2. Project trait propagtion rule
  match pattern: operand(PhysicalProject.class, operand(RelSubset.class))
  Produce PhysicalProject with derived trait.

Since 1.23.0, we removed the rule match importances and ordering, I guess the 
can reduce the optimizatino time around 10~20% for some complex queries with 
many rule calls.

- Haisheng

------------------------------------------------------------------
发件人：Vladimir Ozerov<ppoze...@gmail.com>
日　期：2020年03月15日 04:18:53
收件人：Haisheng Yuan<h.y...@alibaba-inc.com>
抄　送：dev@calcite.apache.org (dev@calcite.apache.org)<dev@calcite.apache.org>
主　题：Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching 
specific rels

Hi Haisheng,

You are right, the behavior I showed is not the default one, I should provide 
more context here. This is how we (Hazelcast) and at least Drill use the engine 
to ensure that the produced plan is optimal, I gave an example in [1]. 
In real distributed engines, we rely on physical properties heavily. Most 
notably, distribution and collation. And the fundamental problem with the 
VolcanoOptimizer, is that it cannot propagate traits in a controlled manner. 
This, in turn, forces us to use AbstractConverters and implement rules in ways, 
which appear strange to Calcite community :-). And this, in turn, leads to 
excessive rule calls and failure to plan complex queries. 

Let's consider the same tree again, but now assuming that this is not the 
complete tree, but a subtree, and there are some parent operators. Let's also 
assume that the ScanRule may produce two equivalent operators with different 
physical properties: PhysicalScan and PhysicalIndexScan[a ASC]. It is important 
to consider both alternatives in parent operators. Now let's consider two 
different ways to optimize that subtree.

1. Canonical Calcite way (default)
1.1 Perform initial rules ordering, parent rules fire first: [ProjectRule, 
ScanRule]
1.2 Invoke ProjectRule, which produces physical project without any physical 
traits
1.3 Invoke ScanRule, which produces, PhysicalScan and PhysicalIndexScan[a ASC]
Since ProjectRule was not aware of different scan alternatives, it missed 
collation, and resulting hypergraph looks like this:

-- PhysicalProject
---- [PhysicalScan, PhysicalIndexScan[a ASC]]

This plan is suboptimal, because of parent operators cannot take advantage of 
collation.

2. Hazelast/Drill way:
2.1 Enable abstract converters
2.2 Rules are ordered in the same way as in example 1: [ProjectRule, ScanRule]
2.3 ProjectRule fires, enumerates physical implementations of the input. Since 
there are no physical inputs yet, it exits without any transformations
2.4 ScanRule fires and produces two physical scans
2.5 Abstract converters ensure that the ProjectRule is scheduled for execution 
again because it's input RelSet has new nodes
2.6 ProjectRule fires again, now having physical inputs, and generates one 
implementation per unique combination of physical input traits. 

As a result, we have two graphs now:

Graph 1:
-- PhysicalProject
---- PhysicalScan

Graph 2:
-- PhysicalProject[a ASC]
---- PhysicalIndexScan[a ASC]

Notice how we propagated physical collation. Now parent operators may take 
advantage of it, e.g. eliminate sorting, or perform streaming aggregation 
instead of blocking hash aggregation. 

This is the fundamental problem we have in Hazelcast: how to ensure the 
complete exploration of the search space without excessive rule calls. 

Very short summary: 
1. The default behavior of VolcanoOptimizer cannot explore the physical search 
space, so plans are not optimal
2. Abstract converters fix this if you follow a certain pattern in rule 
implementations (see 2.3), but generate too many rule calls, so join planning 
rules cannot be called together with other rules, which again lead to not 
optimal plans (yet, better than with p.1)
3. "Trait pull-up" proposal may fix it. But I have a feeling that pulling up 
possible trait combinations from a child node is indistinguishable from child 
node exploration, so it may be not very efficient again
4. A brand new optimizer implementation with recursive top-down approach may 
address all the concerns from p.1-p.3, but appears to be complex to implement 
and may be incompatible with existing rules

Not an easy choice.

Regards,
Vladimir.

[1] 
https://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3CCAJJmzpU9_75O48WeEKgHKg3fTMhK3eAMmHNOVvczj_uUTBxHkA%40mail.gmail.com%3E
сб, 14 мар. 2020 г. в 21:53, Haisheng Yuan <h.y...@alibaba-inc.com>:

I agree that there should be no global rule queue, we should it do it on 
per-operator basis, which is also how other major Cascades frameworks do.

However, Calcite's VolcanoPlanner doesn't generate unnecessary rule calls as 
you described. The current process is:
1. global rule queue: ScanRule, ProjectRule
2. Call ScanRule, produce physical scan
3. Call ProjectRule, produce physical project.

Even with global rule queue of reversed order ProjectRule, ScanRule, there are 
still 2 rule calls. In your step 2, ProjectRule doesn't produce physical node, 
which is incorrect. Any rule is, and should be independent with each other 
rule. If your rule relies on other operators or rules to be explored first, 
then you should think about it twice.

- Haisheng

------------------------------------------------------------------
发件人：Vladimir Ozerov<ppoze...@gmail.com>
日　期：2020年03月15日 01:50:10
收件人：dev@calcite.apache.org (dev@calcite.apache.org)<dev@calcite.apache.org>
主　题：Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific 
rels

Hi Roman,

In my understanding, the proposed minor changes may only decrease the total
number of rule invocations slightly, but all principal problems remain the
same. In the top-down approach, you do not want to implement bottom logical
nodes unless it is requested explicitly by a parent operator.

It seems to me that the very first step to efficient optimizer could be a
new rule invocation infrastructure. There should be *no global rule queue*
at all. Instead, we may introduce the per-node rule queue. Then, the
optimizer performs a recursive top-down optimization dive, invoking the
rules for every operator. Consider the following simple tree:
-- LogicalProject
---- LogicalScan

Assuming that we have two implementation rules ProjectRule, ScanRule, and
abstract converters enabled, VolcanoOptimizer will proceed as follows,
generating one unnecessary rule call:
1. Define global rule call queue: ProjectRule, ScanRule
2. Call ProjectRule, no new nodes are produced
3. Call ScanRule, produce physical scans, reschedule ProjectRule
4. Call ProjectRule again, produce the physical project

With the proposed approach, it will work differently:
1. Define per-operator queues:
LogicalProject -> ProjectRule
LogicalScan -> ScanRule
2. Call optimize(LogicalProject)
3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input
4. Invoke ScanRule, produce physical scans, return control back to
ProjectRule
5. Produce the physical project, finish optimization

Now we have only 2 rule invocations as expected, and we reached the same
result as in the proposed minor changes. But the crucial difference is that
now we have well-defined control flow between operators: start at the top,
delegate to children. With this infrastructure in place, we will be able to
introduce more complex features, such as pruning, or partial exploration
later on.

But notice that this change will be incompatible with the current rules,
i.e. they should be re-written for the new optimization algorithm (e.g. see
step 3), which might be a blocker for the current Calcite users. So maybe
it is better to think of a new optimizer, rather than fixing
VolcanoOptimizer.

Regards,
Vladimir.

вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan <h.y...@alibaba-inc.com>:

> On the other hand, if we don't preprocess and normalize the rel expression
> before going to valcano planner, still compute and keep logical/relational
> properties, like cardinality, for each operator, how can we achieve group
> seach space pruning? Given a physical group expression, its required
> property and upper bound cost C_limit, we need to get its child group
> cardinality, compute local cost C_local, so that we can calculate the
> child group's upper bound cost and pass it down.
>
> I can't figure out how we can do group pruning without shared relational
> properties.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Haisheng Yuan<h.y...@alibaba-inc.com>
> 日 期：2020年01月14日 12:06:17
> 收件人：dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题：Re: Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
>
> The example is valid if Calcite doesn't do normalization or preprocessing
> before going to VolcanoPlanner.
> But many databases and big data systems will try to preprocess the
> expression (push down predicates etc.) so that expressions in the same
> group can share the logical properties, for most case if not all. You may
> argue that it should be cost based, e.g. evaluating filter early can still
> be bad. It is true, but how accurate will the statistics be, how accurate
> will the cost model be?
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人：Julian Hyde<jh...@apache.org>
> 日 期：2020年01月13日 08:44:54
> 收件人：dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题：Re: [DISCUSS] Proposal to add API to force rules matching specific rels
>
> > MEMO group (RelSet) represents logically equivalent expressions.
> > All the expressions in one group should share the same logical
> > properties, e.g. functional dependency, constraint properties etc.
> > But they are not sharing it. Computation is done for each individual
> operator.
>
> It's good, and correct, that we compute for each individual operator.
>
> Suppose that a RelSubset s contains RelNode r1 and we know that the
> constraint x > 0 holds. Suppose that we also have r2 with constraint y
> < 10, and we discover that r1 and r2 are equivalent and belong
> together in s. Now we can safely say that both constraints (x > 0 and
> y < 10) apply to both r1 and r2.
>
> Deducing additional constraints in this way is a big win. The effort
> to compute constraints for each RelNode is well-spent.
>
> This kind of deduction applies to other logical properties (e.g.
> unique keys) and it applies to RelSet as well as RelSubset.
>
> Julian
>
>
> On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
> <kondako...@mail.ru.invalid> wrote:
> >
> > @Haisheng
> >
> > > Calcite uses Project operator and all kinds of ProjectXXXTranposeRule
> to prune unused columns.
> >
> > I also noticed that in most cases Project-related rules took significant
> > part of the planning time. But I didn't explore this problem yet.
> >
> > > MEMO group (RelSet) represents logically equivalent expressions. All
> the expressions in one group should share the same logical properties, e.g.
> functional dependency, constraint properties etc. But they are not sharing
> it. Computation is done for each individual operator.
> >
> > I thought the equivalence of logical properties within groups (RelSets)
> > are implicit. For example, in RelSet#addInternal it is always verified
> > that the new added node has the same type as other members of the set.
> >
> > Anyway I absolutely agree with you that problems with traits
> > propagation, rules matching and other that you mentioned in the previous
> > e-mails should be solved in the first place. We need first to make
> > Volcano optimizer work right and only then make some improvements like
> > search space pruning.
> >
> > I would love to join this work to improve Volcano planner. Looking
> > forward for design doc.
> >
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 11.01.2020 11:42, Haisheng Yuan wrote:
> > > Currently, Calcite uses Project operator and all kinds of
> ProjectXXXTranposeRule to prune unused columns. Every operator's output
> columns use index to reference child operators' columns. If there is a
> Project operator with child operator of a Filter, if we push project down
> under Filter, we will have Project-Filter-Project-FilterInput. All the
> newly generated relnodes will trigger rule matches. e.g. If we already did
> ReduceExpressionRule on filter, but due to the new filter RexCall's input
> ref index changed, we have to apply ReduceExpressionRule on the new filter
> again, even there is nothing can be reduced. Similarly new operator
> transpose/merge rule will be triggered. This can trigger a lot of rule
> matches.
> > >
> > > MEMO group (RelSet) represents logically equivalent expressions. All
> the expressions in one group should share the same logical properties, e.g.
> functional dependency, constraint properties etc. But they are not sharing
> it. Computation is done for each individual operator.
> > >
> > > Without resolving those issue, space pruning won't help much.
> > >
> > > There are a lot of room for improvement. Hope the community can join
> the effort to make Calcite better.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人：Roman Kondakov<kondako...@mail.ru.INVALID>
> > > 日 期：2020年01月10日 19:39:51
> > > 收件人：<dev@calcite.apache.org>
> > > 主 题：Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
> > >
> > > @Haisheng, could you please clarify what you mean by these points?
> > >
> > >> - the poor-design of column pruning,
> > >> - lack of group properties etc.
> > >
> > > I guess I'm not aware of these problems.
> > >
>
>

Re: Re: Re: Re: Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Reply via email to