Re: Re: Re: Problem with converters and possibly rule matching

Vladimir Ozerov Tue, 19 Nov 2019 10:22:47 -0800

Hi Stamatis,

This approach worth trying, but I am afraid that it may produce too many
permutations. For example, this is the logical plan of one of TPC-H
benchmarks, and I am not very keen expanding it much further :-)


901:ProjectLogicalRel(ps_partkey=[$0], val=[$1])
  899:FilterLogicalRel(subset=[rel#900:Subset#66.LOGICAL.ANY.[]],
condition=[>($1, $2)])
    897:JoinLogicalRel(subset=[rel#898:Subset#65.LOGICAL.ANY.[]],
condition=[true], joinType=[left])
      882:AggregateLogicalRel(subset=[rel#883:Subset#57.LOGICAL.ANY.[]],
group=[{0}], val=[SUM($1)])
        880:ProjectLogicalRel(subset=[rel#881:Subset#56.LOGICAL.ANY.[]],
ps_partkey=[$1], $f1=[*($2, $3)])
          878:JoinLogicalRel(subset=[rel#879:Subset#55.LOGICAL.ANY.[]],
condition=[=($5, $6)], joinType=[inner])
            875:JoinLogicalRel(subset=[rel#876:Subset#53.LOGICAL.ANY.[]],
condition=[=($0, $4)], joinType=[inner])

634:MapScanLogicalRel(subset=[rel#873:Subset#51.LOGICAL.ANY.[]],
table=[[partsupp]], projects=[[0, 1, 2, 3]])

632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]],
table=[[supplier]], projects=[[0, 1]])

630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]],
table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')])
      895:AggregateLogicalRel(subset=[rel#896:Subset#64.LOGICAL.ANY.[]],
group=[{}], agg#0=[SINGLE_VALUE($0)])
        893:ProjectLogicalRel(subset=[rel#894:Subset#63.LOGICAL.ANY.[]],
EXPR$0=[*($0, 0.9:DECIMAL(2, 1))])

891:AggregateLogicalRel(subset=[rel#892:Subset#62.LOGICAL.ANY.[]],
group=[{}], agg#0=[SUM($0)])

889:ProjectLogicalRel(subset=[rel#890:Subset#61.LOGICAL.ANY.[]], $f0=[*($1,
$2)])
              887:JoinLogicalRel(subset=[rel#888:Subset#60.LOGICAL.ANY.[]],
condition=[=($4, $5)], joinType=[inner])

885:JoinLogicalRel(subset=[rel#886:Subset#59.LOGICAL.ANY.[]],
condition=[=($0, $3)], joinType=[inner])

680:MapScanLogicalRel(subset=[rel#884:Subset#58.LOGICAL.ANY.[]],
table=[[partsupp]], projects=[[0, 2, 3]])

632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]],
table=[[supplier]], projects=[[0, 1]])

630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]],
table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')])

I have two working solutions at the moment, which do not require
cross-convention converters:
1) Create separate rules for logical-physical node conversion, and for
physical-physical optimization
2) A kind of hack: add fake AbstractConverter with logical traits after the
physical node is created

I hope one of them will work good enough for the production case. Most
likely it would be p.1.

Regards,
Vladimir.


пт, 15 нояб. 2019 г. в 11:04, Stamatis Zampetakis <zabe...@gmail.com>:

> Hi again Vladimir,
>
> In my previous email, I mentioned three rules for performing logical to
> physical conversions.
> It's normal that if you add more operators we will end up with more rules.
> Now that the example has a filter we have the following rule list:
>
> Rule 1: RootLogicalRel -> RootPhysicalRel
> Rule 2: ProjectLogicalRel -> ProjectPhysicalRel
> Rule 3: FilterLogicalRel -> FilterPhysicalRel
> Rule 4: MapScanLogicalRel -> MapScanPhysicalRel
>
> If I understand well, your concern is that in order to get the optimal plan
> you would have to introduce many additional rules for the Exchange operator
> (e.g., ExchangeProjectTransposeRule, ExchangeFilterTransposeRue, etc.).
> I was thinking that it is not necessary to introduce these kind of rules.
> You could have only one rule, i.e., PhysicaExchangeRule, for enforcing a
> distribution that would be applied when there is a requirement for
> particular distribution.
>
> So for the initial plan
> LogicalRoot [distribution=SINGLETON]
> -> LogicalProject
>  -> LogicalFilter
>   -> LogicalScan
>
> The ruleset above should generate the following alternatives where the
> PhysicalExchangeRule is applied 3 times.
>
> Alt1:
> PhysicalRoot
> -> SingletonExchange
>  -> PhysicalProject
>   -> PhysicalFilter
>    -> PhysicalScan
>
> Alt2:
> PhysicalRoot
> -> PhysicalProject
>  -> SingletonExchange
>   -> PhysicalFilter
>    -> PhysicalScan
>
> Alt3:
> PhysicalRoot
> -> PhysicalProject
>  -> PhysicalFilter
>   -> SingletonExchange
>    -> PhysicalScan
>
> This is still in the same spirit of the previous example where the physical
> property (distribution) is either satisfied immediately or passed down to
> the next level.
>
> Best,
> Stamatis
>
> On Thu, Nov 14, 2019 at 11:14 AM Vladimir Ozerov <ppoze...@gmail.com>
> wrote:
>
> > search place* = search space
> >
> > чт, 14 нояб. 2019 г. в 13:10, Vladimir Ozerov <ppoze...@gmail.com>:
> >
> > > Hi Haisheng,
> > >
> > > I double-checked the code. My original version returned false for some
> > > cases, but it didn't affect number of rules calls anyway, so I changed
> it
> > > to always return true. Please note that if I change the code as you
> > > suggested, the test started failing, because bottom-up propagation of
> > rule
> > > calls no longer work: when the child is converted to physical form, the
> > > parent logical node is not notified. This is the very problem I address
> > > with that weird physical-to-logical conversions: they do not make
> sense,
> > > and converter expansion does not produce any new rels, but their
> > existence
> > > allow for logical rule re-trigger which ultimately allow the plan to
> > > compile.
> > >
> > > Regarding two conventions - I agree that it may look strange, but I do
> > not
> > > see any problems from the correctness perspective. Separation of
> logical
> > > and physical planning helps me avoid excessive expansion of the search
> > > place: I do not want my physical rules to produce new rels from not
> > > optimized logical rels. Do you see any problems with that approach?
> > >
> > > Regards,
> > > Vladimir
> > >
> > > ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan <h.y...@alibaba-inc.com>:
> > >
> > >> Hi Vladimir,
> > >>
> > >> The code in PHYSICAL convention L44 looks weird, I think it always
> > >> returns true.
> > >>
> > >>
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
> > >>
> > >> Try this:
> > >>
> > >> fromTraits.containsIfApplicable(Convention.PHYSICAL)
> > >>     && toTraits.containsIfApplicable(Convention.PHYSICAL);
> > >>
> > >>
> > >> Adding a AbstractConverter on logical operators is meaningless.
> Calcite
> > >> is mixing the concept of logical and physical together, which is sad.
> > >>
> > >> BTW, using 2 conventions is not appropriate and wrong.
> > >>
> > >> - Haisheng
> > >>
> > >> ------------------------------------------------------------------
> > >> 发件人：Vladimir Ozerov<ppoze...@gmail.com>
> > >> 日 期：2019年11月05日 18:02:15
> > >> 收件人：Haisheng Yuan<h.y...@alibaba-inc.com>
> > >> 抄 送：dev@calcite.apache.org (dev@calcite.apache.org)<
> > >> dev@calcite.apache.org>
> > >> 主 题：Re: Re: Problem with converters and possibly rule matching
> > >>
> > >> Hi Haisheng,
> > >>
> > >>  think I already tried something very similar to what you explained,
> but
> > >> it gave not an optimal plan. Please let me describe what I did. I
> would
> > >> appreciate your feedback.
> > >>
> > >> 1) We start with a simple operator tree Root <- Project <- Scan, where
> > >> the root is a final aggregator in the distributed query engine:
> > >> -> LogicalRoot
> > >>  -> LogicalProject
> > >>   -> LogicalScan
> > >>
> > >> 2) First, we convert the Root and enforce SINGLETON distribution on a
> > >> child:
> > >> *-> PhysicalRoot[SINGLETON]*
> > >> * -> Enforcer#1[SINGLETON]*
> > >>   -> LogicalProject
> > >>    -> LogicalScan
> > >>
> > >> 3) Then the project's rule is invoked. It doesn't know the
> distribution
> > >> of the input, so it requests ANY distribution. Note that we have to
> set
> > ANY
> > >> to the project as well since we do not know the distribution of the
> > input:
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >> *  -> PhysicalProject[ANY]*
> > >> *   -> Enforcer#2[ANY]*
> > >>     -> LogicalScan
> > >>
> > >> 4) Finally, the physical scan is created and its distribution is
> > >> resolved. Suppose that it is REPLICATED, i.e. the whole result set is
> > >> located on all nodes.
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >>   -> PhysicalProject[ANY]
> > >>    -> Enforcer#2[ANY]
> > >> *    -> PhysicalScan[REPLICATED]*
> > >>
> > >> 5) Now as all logical nodes are converted, we start resolving
> enforcers.
> > >> The second one is no-op, since REPLICATED satisfies ANY:
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> Enforcer#1[SINGLETON]
> > >>   -> PhysicalProject[ANY]
> > >>    -> PhysicalScan[REPLICATED]
> > >>
> > >> 6) But the first enforcer now requires an Exchange, since ANY doesn't
> > >> satisfy SINGLETON!
> > >> -> PhysicalRoot[SINGLETON]
> > >> * -> SingletonExchange[SINGLETON]*
> > >>   -> PhysicalProject[ANY] // <= unresolved!
> > >>    -> PhysicalScan[REPLICATED]
> > >>
> > >> The resulting plan requires data movement only because we didn't know
> > >> precise distribution of the PhysicalProject when it was created. But
> > should
> > >> I enable Convention.Impl.canConvertConvention, bottom-up propagation
> > >> kicks in, and the correct plan is produced because now LogicalProject
> > >> has a chance to be converted to PhysicalProject with the concrete
> > >> distribution. The optimized plan looks like this (since REPLICATED
> > >> satisfies SINGLETON):
> > >> -> PhysicalRoot[SINGLETON]
> > >>  -> PhysicalProject[REPLICATED]
> > >>   -> PhysicalScan[REPLICATED]
> > >>
> > >> You may see this in action in my reproducer:
> > >> 1) Test producing "bad" plan:
> > >>
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
> > >> 2) Root enforces SINGLETON on Project:
> > >>
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
> > >> 3) Project enforces default (ANY) distribution on Scan:
> > >>
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49
> > >>
> > >> Please let me know if this flow is similar to what you meant.
> > >>
> > >> Regards,
> > >> Vladimir.
> > >>
> > >> пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan <h.y...@alibaba-inc.com>:
> > >>
> > >>> Hi Vladimir,
> > >>>
> > >>> This is still can be done through top-down request approach.
> > >>>
> > >>> PhysicalFilter operator should request ANY distribution from child
> > >>> operator, unless there is outer reference in the filter condition, in
> > which
> > >>> case, PhysicalFilter should request SINGLETON or BROADCAST
> > distribution. So
> > >>> in your case, PhysicalFilter request ANY, its required distribution
> > will be
> > >>> enforced on filter's output.
> > >>>
> > >>> Regarding index usage, you should have a FIlterTableScan2IndexGet
> > >>> logical transformation rule, and a IndexGet2IndexScan physical
> > >>> implementation rule. Note that IndexGet is a logical operator and
> > IndexScan
> > >>> is a physical operator, which are also used by SQL Server.
> > >>>
> > >>> - Haisheng
> > >>>
> > >>> ------------------------------------------------------------------
> > >>> 发件人：Vladimir Ozerov<ppoze...@gmail.com>
> > >>> 日 期：2019年11月01日 17:30:26
> > >>> 收件人：<dev@calcite.apache.org>
> > >>> 主 题：Re: Problem with converters and possibly rule matching
> > >>>
> > >>> Hi Stamatis,
> > >>>
> > >>> Thank you for your reply. I also thought that we may set the
> > distribution
> > >>> trait during logical planning because it is known in advance. And the
> > >>> example I gave will work! :-) But unfortunately, it will work only
> > >>> because
> > >>> the tree is very simple, and the Project is adjacent to the Scan.
> This
> > is
> > >>> how my reproducer will work in that case:
> > >>> 1) Root: enforce "SINGLETON" on Project
> > >>> 2) Project: check the logical Scan, infer already resolved
> > distribution,
> > >>> then convert to [PhyiscalProject <- PhysicalScan]
> > >>> 3) Resolve Root enforcer, adding and Exchange if needed.
> > >>>
> > >>> But this stops working as soon as a plan becomes more complex so that
> > it
> > >>> is
> > >>> impossible to infer the distribution from the child immediately.
> E.g.:
> > >>> LogicalRoot [distribution=SINGLETON]
> > >>> -> LogicalProject // We are here new and cannot produce the physical
> > >>> project
> > >>> -> LogicalFilter[distribution=?]
> > >>> -> LogicalScan[distribution=REPLICATED]
> > >>>
> > >>> This is where your suggestion with cascading enforcement may kick in.
> > But
> > >>> now consider that instead of having REPLICATED distribution, which
> > >>> satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't
> > >>> satisfy
> > >>> SINGLETON, so we have to insert the exchange.
> > >>>
> > >>> Before:
> > >>> PhysicalRoot [SINGLETON]
> > >>> -> PhysicalProject [SINGLETON]
> > >>> -> PhysicalFilter [SINGLETON]
> > >>> -> LogicalScan [PARTITIONED] // SINGLETON is enforced here
> > >>>
> > >>> After:
> > >>> PhysicalRoot [SINGLETON]
> > >>> -> PhysicalProject [SINGLETON]
> > >>> -> PhysicalFilter [SINGLETON]
> > >>> -> SingletonExchange [SINGLETON]
> > >>> -> PhysicalScan [PARTITIONED]
> > >>>
> > >>> But unfortunately, this plan is not optimal. Since we perform Project
> > and
> > >>> Filter after the exchange, we now have to reshuffle the whole table.
> > The
> > >>> optimal plan would be to pushdown Project and Filter past exchange:
> > >>> PhysicalRoot [SINGLETON]
> > >>> -> SingletonExchange [SINGLETON]
> > >>> -> PhysicalProject [PARTITIONED]
> > >>> -> PhysicalFilter [PARTITIONED]
> > >>> -> PhysicalScan [PARTITIONED]
> > >>>
> > >>> But in order to achieve that, now I need to introduce new rules which
> > >>> will
> > >>> attempt to transpose dozens of different operators past exchange, so
> > most
> > >>> likely the planning will never finish :-)
> > >>>
> > >>> This is why it seems that the bottom-up approach seems to be more
> > natural
> > >>> for such cases. Another example is index support. A table may have a
> > >>> number
> > >>> of access methods in addition to plain scan, and every such method
> may
> > >>> produce a physical scan with different collations. It is not
> practical
> > to
> > >>> try to enforce something from the top. Instead, we'd better produce
> > >>> alternatives from the bottom.
> > >>>
> > >>> Basically, Drill tries to mix two approaches. First, it tries to
> derive
> > >>> traits from the child. If it fails and no transformations were
> created,
> > >>> it
> > >>> just stops trait propagation. As a result, an exchange is created on
> > top
> > >>> of
> > >>> the operator where we stopped, which again leads to not optimal
> plans.
> > >>> You
> > >>> may observe this in action if you run my reproducer. "testPass"
> > produces
> > >>> an
> > >>> optimal plan with help of abstract converters. "testDrill" produces
> not
> > >>> optimal plan with the unnecessary exchange.
> > >>>
> > >>> All in all, I cannot get how we can employ distribution metadata and
> > >>> index
> > >>> metadata for optimal planning without using a pure bottom-up
> approach.
> > >>>
> > >>> Regards,
> > >>> Vladimir.
> > >>>
> > >>> пт, 1 нояб. 2019 г. в 11:42, Stamatis Zampetakis <zabe...@gmail.com
> >:
> > >>>
> > >>> > The discussion is interesting thanks for the nice examples.
> > >>> > I am replying in this thread since it has all the examples and the
> > >>> history
> > >>> > of the conversation.
> > >>> >
> > >>> > *Part I: Distribution physical or logical property*
> > >>> >
> > >>> > The Volcano paper writes the following regarding properties:
> > >>> >
> > >>> > "Logical properties can be derived from the logical algebra
> > expression
> > >>> and
> > >>> > include schema, expected size, etc., while physical properties
> depend
> > >>> on
> > >>> > algorithms, e.g., sort order, partitioning, etc."
> > >>> >
> > >>> > We could say that partitioning is not only a property which depends
> > on
> > >>> an
> > >>> > algorithm but a property which depends on the schema. When you
> have a
> > >>> table
> > >>> > you know in advance that the particular table is replicated.
> > >>> >
> > >>> > Basically, this means that the starting "logical" plan in your case
> > >>> could
> > >>> > be the following:
> > >>> > RootLogicalRel[convention=LOGICAL, distribution=ANY]
> > >>> > -> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
> > >>> > -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED]
> > >>> >
> > >>> > When you are about to create your physical projection it is not
> > >>> necessary
> > >>> > to create three equivalent paths but the most promising by asking
> the
> > >>> input
> > >>> > what can it offer as traits. Again this partially overlaps with the
> > >>> > discussion about "On demand traitset" [1] where as I wrote there is
> > >>> already
> > >>> > a mechanism to derive traits from children (collation,
> distribution,
> > >>> etc.).
> > >>> >
> > >>> > Note that I am not saying that there is nothing more to be done to
> > >>> improve
> > >>> > the Calcite optimizer; I am just trying to provide a viable
> > alternative
> > >>> > given the current situation of the project.
> > >>> >
> > >>> > *Part II: Example with the original Volcano optimizer*
> > >>> >
> > >>> > Now let me try to show how the original (from the paper) Volcano
> > >>> optimizer
> > >>> > could handle your query.
> > >>> >
> > >>> > The optimization starts and we request two properties convention
> and
> > >>> > distribution.
> > >>> >
> > >>> > RootLogicalRel[convention=PHYSICAL, distribution=SINGLETON]
> > >>> > -> ProjectLogicalRel
> > >>> > -> MapScanLogicalRel
> > >>> >
> > >>> > Rule 1: RootLogicalRel -> RootPhysicalRel applies the algorithm but
> > >>> cannot
> > >>> > satisfy the SINGLETON property so it propagates the demand
> > >>> >
> > >>> > RootPhysicalRel
> > >>> > -> ProjectLogicalRel [convention=PHYSICAL, distribution=SINGLETON]
> > >>> > -> MapScanLogicalRel
> > >>> >
> > >>> > Rule 2: ProjectLogicalRel -> ProjectPhysicalRel applies the
> algorithm
> > >>> but
> > >>> > cannot satisfy the SINGLETON property so it propagates the demand
> > >>> >
> > >>> > RootPhysicalRel
> > >>> > -> ProjectPhysicalRel
> > >>> > -> MapScanLogicalRel [convention=PHYSICAL, distribution=SINGLETON]
> > >>> >
> > >>> > Rule 3: MapScanLogicalRel -> MapScanPhysicalRel applies the
> algorithm
> > >>> and
> > >>> > given that the table is replicated it satisfies the distribution
> > >>> property.
> > >>> >
> > >>> > RootPhysicalRel
> > >>> > -> ProjectPhysicalRel
> > >>> > -> MapScanPhysicalRel
> > >>> >
> > >>> > So we end up with a complete plan that satisfies all properties and
> > >>> does
> > >>> > not have redundant exchange operators.
> > >>> > Moreover we didn't require all possible distributions when we
> applied
> > >>> Rule
> > >>> > 2 since we just want to satisfy the SINGLETON property requested by
> > the
> > >>> > parent.
> > >>> > The example above does not demonstrate the additional optimization
> > >>> paths
> > >>> > which would be apply an enforcer (an Exchange operator to satisfy
> the
> > >>> > SINGLETON property) at a higher level than the scan.
> > >>> >
> > >>> > Would this be an acceptable approach? Does it still suffer from a
> big
> > >>> > search space?
> > >>> >
> > >>> > To make the analog with the actual Volcano optimizer in Calcite the
> > >>> thing
> > >>> > that may be missing is to be able to know what trait was requested
> by
> > >>> the
> > >>> > parent during the application of Rule 2.
> > >>> >
> > >>> > Best,
> > >>> > Stamatis
> > >>> >
> > >>> > [1]
> > >>> >
> > >>> >
> > >>>
> >
> https://lists.apache.org/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> > >>> >
> > >>> > On Wed, Oct 30, 2019 at 7:24 PM Vladimir Ozerov <
> ppoze...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Hi Stamatis,
> > >>> > >
> > >>> > > The problem that the presented reproducer is a very simplified
> > >>> version of
> > >>> > > what is actually needed. In reality, there are several
> distribution
> > >>> > types -
> > >>> > > PARTITIONED, REPLICATED, SINGLETON. To make things worse,
> > >>> PARTITIONED may
> > >>> > > or may not have concrete distribution fields. In theory, I can
> > >>> create one
> > >>> > > transformation per distribution type, but that would increase
> plan
> > >>> space
> > >>> > > significantly. In my sample "Root <- Project <- Scan" plan, there
> > is
> > >>> no
> > >>> > > room for optimization at all, we only need to convert the nodes
> and
> > >>> > > propagate traits. But if I follow the proposed approach, the
> > planner
> > >>> > would
> > >>> > > create three equivalent paths. For complex queries, this may
> > increase
> > >>> > > optimization time significantly.
> > >>> > >
> > >>> > > What I need instead is to gradually convert and optimize nodes
> > >>> > *bottom-up*,
> > >>> > > instead of top-bottom. That is, create a physical scan first,
> then
> > >>> > create a
> > >>> > > physical project on top of it, etc. But at the same time, some
> > rules
> > >>> > still
> > >>> > > require the top-bottom approach. So essentially I need the
> > optimizer
> > >>> to
> > >>> > do
> > >>> > > both. Abstract converters help me establish bottom-up preparation
> > >>> but do
> > >>> > > this at the cost of considering too many trait pairs, and as a
> > >>> result,
> > >>> > also
> > >>> > > pollute the search space.
> > >>> > >
> > >>> > > To the contrast, precise command "I transformed the node A to
> node
> > B,
> > >>> > > please re-trigger the rules for A's parents" would allow us to
> > >>> re-trigger
> > >>> > > only required rules, without adding more nodes.
> > >>> > >
> > >>> > > Does it make sense?
> > >>> > >
> > >>> > > Regards,
> > >>> > > Vladimir.
> > >>> > >
> > >>> > > ср, 30 окт. 2019 г. в 21:02, Stamatis Zampetakis <
> > zabe...@gmail.com
> > >>> >:
> > >>> > >
> > >>> > > > I admit that I didn't thoroughly read the exchanges so far but
> > >>> forcing
> > >>> > > the
> > >>> > > > rules trigger does not seem the right approach.
> > >>> > > >
> > >>> > > > Other than that I would like to backtrack a bit to point 4.3
> > >>> raised by
> > >>> > > > Vladimir.
> > >>> > > >
> > >>> > > > "ProjectPhysicalRule [6] - transforms logical project to
> physical
> > >>> > > > project *ONLY* if there is an underlying physical input with
> > >>> REPLICATED
> > >>> > > or
> > >>> > > > SINGLETON distribution"
> > >>> > > >
> > >>> > > > The rule could be modified to do the following two
> > transformations:
> > >>> > > > 1. Create a physical project and require the input to be
> > >>> REPLICATED.
> > >>> > > > 2. Create a physical project and require
> > >>> > > > the input to be SINGLETON.
> > >>> > > >
> > >>> > > > I would assume that afterwards when your scan rule fires it
> > should
> > >>> go
> > >>> > to
> > >>> > > > the appropriate subset and give you back the desired plan. Is
> > >>> there a
> > >>> > > > problem with this approach?
> > >>> > > >
> > >>> > > > Best,
> > >>> > > > Stamatis
> > >>> > > >
> > >>> > > > On Wed, Oct 30, 2019, 5:52 PM Seliverstov Igor <
> > >>> gvvinbl...@gmail.com>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > Unfortunately it requires package-private API usage.
> > >>> > > > >
> > >>> > > > > Best solution there if Calcite provide it for end users.
> > >>> > > > >
> > >>> > > > > ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov <
> ppoze...@gmail.com
> > >:
> > >>> > > > >
> > >>> > > > > > “e pension” == “expand conversion” :)
> > >>> > > > > >
> > >>> > > > > > ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov <
> > >>> ppoze...@gmail.com>:
> > >>> > > > > >
> > >>> > > > > > > Yes, that may work. Even if e pension rule is used, for
> the
> > >>> most
> > >>> > > > cases
> > >>> > > > > it
> > >>> > > > > > > will not trigger any real conversions, since we are
> moving
> > >>> from
> > >>> > > > > abstract
> > >>> > > > > > > convention to physical, and created converters will have
> > the
> > >>> > > opposite
> > >>> > > > > > trait
> > >>> > > > > > > direction (from physical to abstract).
> > >>> > > > > > >
> > >>> > > > > > > But again - ideally we only need to re-trigger the rules
> > for
> > >>> a
> > >>> > > > specific
> > >>> > > > > > > node, no more than that. So API support like
> > >>> > > > > > > “VolcanoPlanner.forceRules(RelNode)” would be very
> > >>> convenient.
> > >>> > > > > > >
> > >>> > > > > > > What do you think?
> > >>> > > > > > >
> > >>> > > > > > > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor <
> > >>> > > gvvinbl...@gmail.com
> > >>> > > > >:
> > >>> > > > > > >
> > >>> > > > > > >> I considered manual rules calling too, for now we use
> > >>> abstract
> > >>> > > > > > converters
> > >>> > > > > > >> +
> > >>> > > > > > >> ExpandConversionRule for exchanges producing.
> > >>> > > > > > >>
> > >>> > > > > > >> You may create such converters manually (checking
> > >>> appropriate
> > >>> > > > subset)
> > >>> > > > > > this
> > >>> > > > > > >> case you may reduce created converters count, also, a
> > >>> converter
> > >>> > > is a
> > >>> > > > > > quite
> > >>> > > > > > >> special node, that does almost nothing (without
> > >>> corresponding
> > >>> > > rule)
> > >>> > > > it
> > >>> > > > > > may
> > >>> > > > > > >> be used just as a rule trigger.
> > >>> > > > > > >>
> > >>> > > > > > >> Regards,
> > >>> > > > > > >> Igor
> > >>> > > > > > >>
> > >>> > > > > > >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov <
> > >>> ppoze...@gmail.com
> > >>> > >:
> > >>> > > > > > >>
> > >>> > > > > > >> > One funny hack which helped me is manual registration
> > of a
> > >>> > fake
> > >>> > > > > > RelNode
> > >>> > > > > > >> > with desired traits through VolcanoPlanner.register()
> > >>> method.
> > >>> > > But
> > >>> > > > > > again,
> > >>> > > > > > >> > this leads to trashing. What could really help is a
> call
> > >>> to
> > >>> > > > > > >> > VolcanoPlanner.fireRules() with desired rel. But this
> > >>> doesn't
> > >>> > > work
> > >>> > > > > out
> > >>> > > > > > >> of
> > >>> > > > > > >> > the box since some internals of the rule queue needs
> to
> > be
> > >>> > > > adjusted.
> > >>> > > > > > >> >
> > >>> > > > > > >> > What does the community think about adding a method
> > which
> > >>> will
> > >>> > > > > re-add
> > >>> > > > > > >> rules
> > >>> > > > > > >> > applicable to the specific RelNode to the rule queue?
> > >>> > > > > > >> >
> > >>> > > > > > >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov <
> > >>> > > ppoze...@gmail.com
> > >>> > > > >:
> > >>> > > > > > >> >
> > >>> > > > > > >> > > Hi Igor,
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > Yes, I came to the same conclusion, thank you. This
> is
> > >>> how
> > >>> > it
> > >>> > > > > > >> basically
> > >>> > > > > > >> > > happens when converters are disabled:
> > >>> > > > > > >> > > 1) We start with initial tree: [LogicalProject] <-
> > >>> > > [LogicalScan]
> > >>> > > > > > >> > > 2) Then we convert LogicalScan to PhysicalScan, so
> it
> > is
> > >>> > added
> > >>> > > > to
> > >>> > > > > > the
> > >>> > > > > > >> > > set: [LogicalProject] <- [LogicalScan, PhysicalScan]
> > >>> > > > > > >> > > 3) Finally, when it is time to fire a rule for
> > >>> PhysicalScan,
> > >>> > > we
> > >>> > > > > try
> > >>> > > > > > to
> > >>> > > > > > >> > get
> > >>> > > > > > >> > > parents of that scan set with traits of the
> > >>> PhysicalScan.
> > >>> > > Since
> > >>> > > > > > there
> > >>> > > > > > >> are
> > >>> > > > > > >> > > no such parents (we skipped it intentionally), the
> > rule
> > >>> is
> > >>> > not
> > >>> > > > > > queued.
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > But when converters are enabled, a converter rel is
> > >>> created:
> > >>> > > > > > >> > [LogicalProject]
> > >>> > > > > > >> > > <- [LogicalScan, PhysicalScan,
> > >>> > > ConverterFromPhysicalToLogical].
> > >>> > > > No
> > >>> > > > > > >> rules
> > >>> > > > > > >> > > are fired for PhysicalScan again, but they are fired
> > for
> > >>> > > > converter
> > >>> > > > > > >> since
> > >>> > > > > > >> > > it has the necessary LOGICAL trait.
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > It makes sense, that converters essentially allow
> > >>> forcing
> > >>> > rule
> > >>> > > > > > >> invocation
> > >>> > > > > > >> > > on parents, even if the child was created with
> > different
> > >>> > > traits.
> > >>> > > > > But
> > >>> > > > > > >> it
> > >>> > > > > > >> > > seems that this mechanism may be too heavy for
> complex
> > >>> > queries
> > >>> > > > > > >> because it
> > >>> > > > > > >> > > literally creates hundreds of new converter rels and
> > >>> > triggers
> > >>> > > > > rules
> > >>> > > > > > >> over
> > >>> > > > > > >> > > and over again.
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > We need some fine-grained alternative. Basically,
> what
> > >>> would
> > >>> > > be
> > >>> > > > > > enough
> > >>> > > > > > >> > for
> > >>> > > > > > >> > > me is to let the planner know somehow: "I created
> that
> > >>> rel,
> > >>> > > and
> > >>> > > > I
> > >>> > > > > > want
> > >>> > > > > > >> > you
> > >>> > > > > > >> > > to execute parent rules not only using its trait but
> > >>> also on
> > >>> > > > this
> > >>> > > > > > and
> > >>> > > > > > >> > those
> > >>> > > > > > >> > > traits."
> > >>> > > > > > >> > > Is there any API in Calcite which allows doing this
> > >>> without
> > >>> > > > > > creating a
> > >>> > > > > > >> > new
> > >>> > > > > > >> > > rel node?
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > Regards,
> > >>> > > > > > >> > > Vladimir.
> > >>> > > > > > >> > >
> > >>> > > > > > >> > >
> > >>> > > > > > >> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor <
> > >>> > > > > gvvinbl...@gmail.com
> > >>> > > > > > >:
> > >>> > > > > > >> > >
> > >>> > > > > > >> > >> Vladimir,
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >> Probably it'll help you:
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >> Seems the cause of issue in
> RelSubset.getParentRels()
> > >>> The
> > >>> > > > check
> > >>> > > > > > used
> > >>> > > > > > >> > when
> > >>> > > > > > >> > >> the planner schedules newly matched rules after
> > >>> successful
> > >>> > > > > > >> > transformation
> > >>> > > > > > >> > >> (see VolcanoRuleCall.matchRecurse), it prevents the
> > >>> parent
> > >>> > > rule
> > >>> > > > > be
> > >>> > > > > > >> > applied
> > >>> > > > > > >> > >> once again (here your logical project with an input
> > >>> having
> > >>> > > ANY
> > >>> > > > > > >> > >> distribution
> > >>> > > > > > >> > >> doesn't satisfy a transformed input traits).
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >> In our case we use another workaround, so there are
> > >>> also
> > >>> > much
> > >>> > > > > more
> > >>> > > > > > >> > >> transformations than we wanted, so the desired rule
> > is
> > >>> > > > triggered.
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov <
> > >>> > > ppoze...@gmail.com
> > >>> > > > >:
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >> > Hi Vladimir,
> > >>> > > > > > >> > >> >
> > >>> > > > > > >> > >> > I am sorry. Pushed, it works now.
> > >>> > > > > > >> > >> >
> > >>> > > > > > >> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov <
> > >>> > > > > > >> > >> > sitnikov.vladi...@gmail.com
> > >>> > > > > > >> > >> > >:
> > >>> > > > > > >> > >> >
> > >>> > > > > > >> > >> > > > mvn clean test
> > >>> > > > > > >> > >> > >
> > >>> > > > > > >> > >> > > [ERROR] The goal you specified requires a
> project
> > >>> to
> > >>> > > > execute
> > >>> > > > > > but
> > >>> > > > > > >> > >> there is
> > >>> > > > > > >> > >> > > no POM in this directory
> > >>> > > > > > >> > >> > >
> > >>> > > > > > >> > >> > > Vladimir, please push missing files
> > >>> > > > > > >> > >> > >
> > >>> > > > > > >> > >> > > Vladimir
> > >>> > > > > > >> > >> > >
> > >>> > > > > > >> > >> >
> > >>> > > > > > >> > >>
> > >>> > > > > > >> > >
> > >>> > > > > > >> >
> > >>> > > > > > >>
> > >>> > > > > > >
> > >>> > > > > >
> > >>> > > > >
> > >>> > > >
> > >>> > >
> > >>> >
> > >>>
> > >>>
> > >>
> >
>

Re: Re: Re: Problem with converters and possibly rule matching

Reply via email to