Hi Roman,

In my understanding, the proposed minor changes may only decrease the total
number of rule invocations slightly, but all principal problems remain the
same. In the top-down approach, you do not want to implement bottom logical
nodes unless it is requested explicitly by a parent operator.

It seems to me that the very first step to efficient optimizer could be a
new rule invocation infrastructure. There should be *no global rule queue*
at all. Instead, we may introduce the per-node rule queue. Then, the
optimizer performs a recursive top-down optimization dive, invoking the
rules for every operator. Consider the following simple tree:
-- LogicalProject
---- LogicalScan

Assuming that we have two implementation rules ProjectRule, ScanRule, and
abstract converters enabled, VolcanoOptimizer will proceed as follows,
generating one unnecessary rule call:
1. Define global rule call queue: ProjectRule, ScanRule
2. Call ProjectRule, no new nodes are produced
3. Call ScanRule, produce physical scans, reschedule ProjectRule
4. Call ProjectRule again, produce the physical project

With the proposed approach, it will work differently:
1. Define per-operator queues:
LogicalProject -> ProjectRule
LogicalScan -> ScanRule
2. Call optimize(LogicalProject)
3. Invoke ProjectRule, which calls optimize(LogicalScan) on the input
4. Invoke ScanRule, produce physical scans, return control back to
ProjectRule
5. Produce the physical project, finish optimization

Now we have only 2 rule invocations as expected, and we reached the same
result as in the proposed minor changes. But the crucial difference is that
now we have well-defined control flow between operators: start at the top,
delegate to children. With this infrastructure in place, we will be able to
introduce more complex features, such as pruning, or partial exploration
later on.

But notice that this change will be incompatible with the current rules,
i.e. they should be re-written for the new optimization algorithm (e.g. see
step 3), which might be a blocker for the current Calcite users. So maybe
it is better to think of a new optimizer, rather than fixing
VolcanoOptimizer.

Regards,
Vladimir.


вт, 14 янв. 2020 г. в 23:52, Haisheng Yuan <h.y...@alibaba-inc.com>:

> On the other hand, if we don't preprocess and normalize the rel expression
> before going to valcano planner, still compute and keep logical/relational
> properties, like cardinality, for each operator, how can we achieve group
> seach space pruning? Given a physical group expression, its required
> property and upper bound cost C_limit, we need to get its child group
> cardinality, compute local cost C_local,  so that we can calculate the
> child group's upper bound cost and pass it down.
>
> I can't figure out how we can do group pruning without shared relational
> properties.
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人:Haisheng Yuan<h.y...@alibaba-inc.com>
> 日 期:2020年01月14日 12:06:17
> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题:Re: Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
>
> The example is valid if Calcite doesn't do normalization or preprocessing
> before going to VolcanoPlanner.
> But many databases and big data systems will try to preprocess the
> expression (push down predicates etc.) so that expressions in the same
> group can share the logical properties, for most case if not all. You may
> argue that it should be cost based, e.g. evaluating filter early can still
> be bad. It is true, but how accurate will the statistics be, how accurate
> will the cost model be?
>
> - Haisheng
>
> ------------------------------------------------------------------
> 发件人:Julian Hyde<jh...@apache.org>
> 日 期:2020年01月13日 08:44:54
> 收件人:dev@calcite.apache.org<dev@calcite.apache.org>
> 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
>
> > MEMO group (RelSet) represents logically equivalent expressions.
> > All the expressions in one group should share the same logical
> > properties, e.g. functional dependency, constraint properties etc.
> > But they are not sharing it. Computation is done for each individual
> operator.
>
> It's good, and correct, that we compute for each individual operator.
>
> Suppose that a RelSubset s contains RelNode r1 and we know that the
> constraint x > 0 holds. Suppose that we also have r2 with constraint y
> < 10, and we discover that r1 and r2 are equivalent and belong
> together in s. Now we can safely say that both constraints (x > 0 and
> y < 10) apply to both r1 and r2.
>
> Deducing additional constraints in this way is a big win. The effort
> to compute constraints for each RelNode is well-spent.
>
> This kind of deduction applies to other logical properties (e.g.
> unique keys) and it applies to RelSet as well as RelSubset.
>
> Julian
>
>
> On Sun, Jan 12, 2020 at 10:10 AM Roman Kondakov
> <kondako...@mail.ru.invalid> wrote:
> >
> > @Haisheng
> >
> > > Calcite uses Project operator and all kinds of ProjectXXXTranposeRule
> to prune unused columns.
> >
> > I also noticed that in most cases Project-related rules took significant
> > part of the planning time. But I didn't explore this problem yet.
> >
> > > MEMO group (RelSet) represents logically equivalent expressions. All
> the expressions in one group should share the same logical properties, e.g.
> functional dependency, constraint properties etc. But they are not sharing
> it. Computation is done for each individual operator.
> >
> > I thought the equivalence of logical properties within groups (RelSets)
> > are implicit. For example, in RelSet#addInternal it is always verified
> > that the new added node has the same type as other members of the set.
> >
> > Anyway I absolutely agree with you that problems with traits
> > propagation, rules matching and other that you mentioned in the previous
> > e-mails should be solved in the first place. We need first to make
> > Volcano optimizer work right and only then make some improvements like
> > search space pruning.
> >
> > I would love to join this work to improve Volcano planner. Looking
> > forward for design doc.
> >
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 11.01.2020 11:42, Haisheng Yuan wrote:
> > > Currently, Calcite uses Project operator and all kinds of
> ProjectXXXTranposeRule to prune unused columns. Every operator's output
> columns use index to reference child operators' columns. If there is a
> Project operator with child operator of a Filter, if we push project down
> under Filter, we will have Project-Filter-Project-FilterInput. All the
> newly generated relnodes will trigger rule matches. e.g. If we already did
> ReduceExpressionRule on filter, but due to the new filter RexCall's input
> ref index changed, we have to apply ReduceExpressionRule on the new filter
> again, even there is nothing can be reduced. Similarly new operator
> transpose/merge rule will be triggered. This can trigger a lot of rule
> matches.
> > >
> > > MEMO group (RelSet) represents logically equivalent expressions. All
> the expressions in one group should share the same logical properties, e.g.
> functional dependency, constraint properties etc. But they are not sharing
> it. Computation is done for each individual operator.
> > >
> > > Without resolving those issue, space pruning won't help much.
> > >
> > > There are a lot of room for improvement. Hope the community can join
> the effort to make Calcite better.
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Roman Kondakov<kondako...@mail.ru.INVALID>
> > > 日 期:2020年01月10日 19:39:51
> > > 收件人:<dev@calcite.apache.org>
> > > 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific
> rels
> > >
> > > @Haisheng, could you please clarify what you mean by these points?
> > >
> > >> - the poor-design of column pruning,
> > >> - lack of group properties etc.
> > >
> > > I guess I'm not aware of these problems.
> > >
>
>

Reply via email to