Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Vladimir Ozerov Thu, 31 Oct 2019 10:43:42 -0700

Hi Xiening,

Yes, I think that the manual creation of converters to trigger parent rules
should be enough at the moment. I'll try to explore this possibility. Thank
you.


Regards,
Vladimir

чт, 31 окт. 2019 г. в 20:11, Xiening Dai <xndai....@gmail.com>:

> Hi Vladimir,
>
> I think for short/mid term, #2 way (using AbstractConverter) should work
> after we fix CALCITE-2970. We already understand the root cause, now are
> looking at the best way to fix it. If you cannot wait, you can also create
> your own converter rule so it won’t generate logical node, and the
> performance should be much better. And to your concern regarding the
> overhead of creating AbstractConverter objects, I think they are just minor
> overheads compared to the rest of part of the work framework’s doing (rule
> matching, merging set, etc).
>
> Regarding the comment "to explore potential traits of children before
> optimizing parents”, I totally agree and want to point out we should also
> consider parent requirements. Currently such mechanism is missing in
> Calcite, and we are having discussion on how to add it as part of the
> Volcano planner. Danny'd shared the mail archive previously. Welcome to
> join the discussion.
>
>
> > On Oct 31, 2019, at 3:37 AM, Vladimir Ozerov <ppoze...@gmail.com> wrote:
> >
> > Hi Danny,
> >
> > Thank you very much for the links. What is described here is pretty much
> > similar to the problem I describe. Especially the discussion about trait
> > propagation, as this is basically what I need - to explore potential
> traits
> > of children before optimizing parents. And this is basically what Drill
> > already does with it's SubsetTransformer:
> > 1) There is a SubsetTransformer interface, which iterates over physical
> > relations of the given subset [1]
> > 2) If you want to make a physical project, you iterate over physical
> > relations of the input subset and create possible physical projects [2]
> > 3) But if you cannot find any physical input, then you trigger creation
> of
> > a "bad" physical project, which is very likely to have poor cost because
> it
> > cannot take advantage of input's distribution information [3]
> > So, step 2 - is a trait set propagation which is needed by many
> > distributed engines. Step 3 - an attempt to workaround current
> > VolcanoPlanner behavior, when a parent rule is fired only if produced
> child
> > node has compatible trait set.
> >
> > I do not know Calcite's architecture that good but at first glance, the
> > proposed ability to re-fire rules of a specific Rel seems good enough. It
> > doesn't expand search space, because no new nodes are created, and it
> seems
> > to be relatively simple to implement.
> >
> > [1]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> > [2]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> > [3]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
> >
> > чт, 31 окт. 2019 г. в 12:21, Danny Chan <yuzhao....@gmail.com>:
> >
> >> Thanks Vladimir for bringing up this discussion !
> >>
> >> Did you notice that there is a JIRA issue about this problem ? [1] Also
> a
> >> discussion about how to propagate the traits [2]
> >>
> >> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >> [2]
> >>
> https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> >>
> >> Best,
> >> Danny Chan
> >> 在 2019年10月31日 +0800 PM3:56，Vladimir Ozerov <ppoze...@gmail.com>，写道：
> >>> Hi colleagues,
> >>>
> >>> I would like to discuss with the community the possibility of adding a
> >> new
> >>> public method to VolcanoPlanner which will forcefully re-trigger the
> >> rules
> >>> for the specific rel. This is a follow up of a discussion started in
> the
> >>> other thread [1].
> >>>
> >>> **Problem statement**
> >>> When converting between conventions during optimization VolcanoPlanner
> >>> prefers the top-bottom approach, so that the nodes are converted from
> the
> >>> root. But in some cases, the intermediate node must be converted after
> >> its
> >>> children. This is especially true for distributed SQL engines, which
> rely
> >>> on distribution traits during the optimization process: it is not
> >> possible
> >>> to efficiently choose a proper physical implementation of a parent node
> >>> unless the physical representation of a child node is known.
> >>>
> >>> It seems that presently VolcanoPlanner cannot address such cases by
> >>> default. Consider that we have two nodes and associated rules which
> >> convert
> >>> them to physical counterparts:
> >>> [LogicalParent <- LogicalChild]
> >>> The parent should be converted after the child. When the child is
> >>> converted, the physical node is created:
> >>> [LogicalParent <- {LogicalChild, PhysicalChild}]
> >>> In order to finish the optimization process, we need to convert the
> >> parent.
> >>> But parent rules are not fired, because PhysicalChild has traits
> >>> incompatible with LogicalParent.
> >>>
> >>> Presently the problem could be solved in two ways:
> >>> 1) Always produce conversions when going top-down. In this case, I
> first
> >>> create a physical parent, then a physical child. The problem is that
> >>> created parent is not optimal because it didn't know child distribution
> >> at
> >>> the time of creation. So the full flow would be: create a bad parent,
> >>> create a good child, create a good parent.
> >>> 1.1) [LogicalParent <- LogicalChild]
> >>> 1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
> >>> 1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild,
> >> PhysicalChild}]
> >>> 1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
> >>> {LogicalChild, PhysicalChild}]
> >>> What is worse, the creation of a not optimal parent will trigger rules
> on
> >>> its parent, which in turn may create a not optimal parent-parent node,
> >> etc.
> >>>
> >>> 2) Make sure that your convention returns true for
> >>> Convention.canConvertConvention. In this case, every time a new rel is
> >>> added to a RelSet, its traitset will be compared to traitsets of all
> >> other
> >>> rels in the set. For every pair of traitset we may ask the engine to
> >> create
> >>> a relevant AbstractConverter. The net effect is that
> >> "physical-to-logical"
> >>> converter is created, which re-triggers the rule on the logical parent
> >>> since their conventions are compatible:
> >>> 2.1) [LogicalParent <- LogicalChild]
> >>> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> >>> 2.3) [LogicalParent <- {LogicalChild, PhysicalChild,
> >>> PhysicalToLogicalConverter}]
> >>> 2.4) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild,
> >>> PhysicalToLogicalConverter}]
> >>>
> >>> The problem with that approach is that it is too coarse-grained since
> we
> >>> operate on traitsets rather than rels. As a result, additional memory
> and
> >>> CPU pressure are introduced because usually too many converters are
> >>> created, which triggers the rules over and over again.
> >>>
> >>> **Affected products**
> >>> At the moment two distributed engines are being developed for Hazelcast
> >> and
> >>> Apache Ignite. Both require bottom-up optimization and currently rely
> on
> >>> the second workaround.
> >>> Another example is Apache Drill. I do not know whether its community is
> >>> concerned with the issue, but it also uses bottom-up optimization for
> >> many
> >>> rules and employs both the aforementioned workarounds. As a result, I
> >> guess
> >>> that Drill's optimizer also creates too many rels during optimization
> and
> >>> suffer from huge search space. Please correct me if I am wrong.
> >>>
> >>> **Proposal**
> >>> The key problem is that there is no way to re-fire rules on a specific
> >>> node. The original problem could have been solved, if it would be
> >> possible
> >>> to re-fire rules on a LogicalParent without creating any additional
> rels.
> >>> That would lead to a clear optimization flow:
> >>> 2.1) [LogicalParent <- LogicalChild]
> >>> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> >>> 2.3) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild}]
> >>>
> >>> We can add the following method to VolcanoPlanner (RelOptPlanner?)
> >>> interface:
> >>> void fireRules(RelNode rel)
> >>>
> >>> This method will fire the rules on a passed node in a deferred mode as
> if
> >>> it was the new node just added to the optimizer. This would require
> >> slight
> >>> changes to RuleQueue.addMatch method, and possibly some other places.
> >>>
> >>> Usage example:
> >>> class PhysicalChildRule extends RelOptRule {
> >>> void onMatch(RelOptRuleCall call) {
> >>> LogicalChild logicalRel = call.get(0);
> >>> PhysicalChild physicalRel = ...;
> >>>
> >>> call.transformTo(physicalRel);
> >>> optimizer.fireRules(logicalRel);
> >>> }
> >>> }
> >>>
> >>> What does the community think about such a method? Does it make any
> sense
> >>> to you? If not, do you aware of any other ways on how to organize
> >> bottom-up
> >>> optimization with VolcanoPlanner without the creation of additional
> rels?
> >>>
> >>> If the community is OK in general, I can create try to create a PR
> with a
> >>> prototype.
> >>>
> >>> Would appreciate your feedback.
> >>>
> >>> Regards,
> >>> Vladimir.
> >>>
> >>> [1]
> >>>
> >>
> https://ponymail-vm.apache.org/_GUI_/thread.html/13e7ab2040bfa4902db6647992ec4203ceb0262cfcb28d38ef7e9e32@%3Cdev.calcite.apache.org%3E
> >>
>
>

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

Reply via email to