The 1 approach is what actually Apache Flink did and works great.

Vladimir Ozerov <ppoze...@gmail.com>于2019年11月20日 周三上午2:22写道:

> Hi Stamatis,
>
> This approach worth trying, but I am afraid that it may produce too many
> permutations. For example, this is the logical plan of one of TPC-H
> benchmarks, and I am not very keen expanding it much further :-)
>
> 901:ProjectLogicalRel(ps_partkey=[$0], val=[$1])
>   899:FilterLogicalRel(subset=[rel#900:Subset#66.LOGICAL.ANY.[]],
> condition=[>($1, $2)])
>     897:JoinLogicalRel(subset=[rel#898:Subset#65.LOGICAL.ANY.[]],
> condition=[true], joinType=[left])
>       882:AggregateLogicalRel(subset=[rel#883:Subset#57.LOGICAL.ANY.[]],
> group=[{0}], val=[SUM($1)])
>         880:ProjectLogicalRel(subset=[rel#881:Subset#56.LOGICAL.ANY.[]],
> ps_partkey=[$1], $f1=[*($2, $3)])
>           878:JoinLogicalRel(subset=[rel#879:Subset#55.LOGICAL.ANY.[]],
> condition=[=($5, $6)], joinType=[inner])
>             875:JoinLogicalRel(subset=[rel#876:Subset#53.LOGICAL.ANY.[]],
> condition=[=($0, $4)], joinType=[inner])
>
> 634:MapScanLogicalRel(subset=[rel#873:Subset#51.LOGICAL.ANY.[]],
> table=[[partsupp]], projects=[[0, 1, 2, 3]])
>
> 632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]],
> table=[[supplier]], projects=[[0, 1]])
>
> 630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]],
> table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')])
>       895:AggregateLogicalRel(subset=[rel#896:Subset#64.LOGICAL.ANY.[]],
> group=[{}], agg#0=[SINGLE_VALUE($0)])
>         893:ProjectLogicalRel(subset=[rel#894:Subset#63.LOGICAL.ANY.[]],
> EXPR$0=[*($0, 0.9:DECIMAL(2, 1))])
>
> 891:AggregateLogicalRel(subset=[rel#892:Subset#62.LOGICAL.ANY.[]],
> group=[{}], agg#0=[SUM($0)])
>
> 889:ProjectLogicalRel(subset=[rel#890:Subset#61.LOGICAL.ANY.[]], $f0=[*($1,
> $2)])
>               887:JoinLogicalRel(subset=[rel#888:Subset#60.LOGICAL.ANY.[]],
> condition=[=($4, $5)], joinType=[inner])
>
> 885:JoinLogicalRel(subset=[rel#886:Subset#59.LOGICAL.ANY.[]],
> condition=[=($0, $3)], joinType=[inner])
>
> 680:MapScanLogicalRel(subset=[rel#884:Subset#58.LOGICAL.ANY.[]],
> table=[[partsupp]], projects=[[0, 2, 3]])
>
> 632:MapScanLogicalRel(subset=[rel#874:Subset#52.LOGICAL.ANY.[]],
> table=[[supplier]], projects=[[0, 1]])
>
> 630:MapScanLogicalRel(subset=[rel#877:Subset#54.LOGICAL.ANY.[]],
> table=[[nation]], projects=[[0, 1]], filter=[=($1, '[NATION]')])
>
> I have two working solutions at the moment, which do not require
> cross-convention converters:
> 1) Create separate rules for logical-physical node conversion, and for
> physical-physical optimization
> 2) A kind of hack: add fake AbstractConverter with logical traits after the
> physical node is created
>
> I hope one of them will work good enough for the production case. Most
> likely it would be p.1.
>
> Regards,
> Vladimir.
>
>
> пт, 15 нояб. 2019 г. в 11:04, Stamatis Zampetakis <zabe...@gmail.com>:
>
> > Hi again Vladimir,
> >
> > In my previous email, I mentioned three rules for performing logical to
> > physical conversions.
> > It's normal that if you add more operators we will end up with more
> rules.
> > Now that the example has a filter we have the following rule list:
> >
> > Rule 1: RootLogicalRel -> RootPhysicalRel
> > Rule 2: ProjectLogicalRel -> ProjectPhysicalRel
> > Rule 3: FilterLogicalRel -> FilterPhysicalRel
> > Rule 4: MapScanLogicalRel -> MapScanPhysicalRel
> >
> > If I understand well, your concern is that in order to get the optimal
> plan
> > you would have to introduce many additional rules for the Exchange
> operator
> > (e.g., ExchangeProjectTransposeRule, ExchangeFilterTransposeRue, etc.).
> > I was thinking that it is not necessary to introduce these kind of rules.
> > You could have only one rule, i.e., PhysicaExchangeRule, for enforcing a
> > distribution that would be applied when there is a requirement for
> > particular distribution.
> >
> > So for the initial plan
> > LogicalRoot [distribution=SINGLETON]
> > -> LogicalProject
> >  -> LogicalFilter
> >   -> LogicalScan
> >
> > The ruleset above should generate the following alternatives where the
> > PhysicalExchangeRule is applied 3 times.
> >
> > Alt1:
> > PhysicalRoot
> > -> SingletonExchange
> >  -> PhysicalProject
> >   -> PhysicalFilter
> >    -> PhysicalScan
> >
> > Alt2:
> > PhysicalRoot
> > -> PhysicalProject
> >  -> SingletonExchange
> >   -> PhysicalFilter
> >    -> PhysicalScan
> >
> > Alt3:
> > PhysicalRoot
> > -> PhysicalProject
> >  -> PhysicalFilter
> >   -> SingletonExchange
> >    -> PhysicalScan
> >
> > This is still in the same spirit of the previous example where the
> physical
> > property (distribution) is either satisfied immediately or passed down to
> > the next level.
> >
> > Best,
> > Stamatis
> >
> > On Thu, Nov 14, 2019 at 11:14 AM Vladimir Ozerov <ppoze...@gmail.com>
> > wrote:
> >
> > > search place* = search space
> > >
> > > чт, 14 нояб. 2019 г. в 13:10, Vladimir Ozerov <ppoze...@gmail.com>:
> > >
> > > > Hi Haisheng,
> > > >
> > > > I double-checked the code. My original version returned false for
> some
> > > > cases, but it didn't affect number of rules calls anyway, so I
> changed
> > it
> > > > to always return true. Please note that if I change the code as you
> > > > suggested, the test started failing, because bottom-up propagation of
> > > rule
> > > > calls no longer work: when the child is converted to physical form,
> the
> > > > parent logical node is not notified. This is the very problem I
> address
> > > > with that weird physical-to-logical conversions: they do not make
> > sense,
> > > > and converter expansion does not produce any new rels, but their
> > > existence
> > > > allow for logical rule re-trigger which ultimately allow the plan to
> > > > compile.
> > > >
> > > > Regarding two conventions - I agree that it may look strange, but I
> do
> > > not
> > > > see any problems from the correctness perspective. Separation of
> > logical
> > > > and physical planning helps me avoid excessive expansion of the
> search
> > > > place: I do not want my physical rules to produce new rels from not
> > > > optimized logical rels. Do you see any problems with that approach?
> > > >
> > > > Regards,
> > > > Vladimir
> > > >
> > > > ср, 6 нояб. 2019 г. в 03:52, Haisheng Yuan <h.y...@alibaba-inc.com>:
> > > >
> > > >> Hi Vladimir,
> > > >>
> > > >> The code in PHYSICAL convention L44 looks weird, I think it always
> > > >> returns true.
> > > >>
> > > >>
> > >
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/HazelcastConventions.java#L44
> > > >>
> > > >> Try this:
> > > >>
> > > >> fromTraits.containsIfApplicable(Convention.PHYSICAL)
> > > >>     && toTraits.containsIfApplicable(Convention.PHYSICAL);
> > > >>
> > > >>
> > > >> Adding a AbstractConverter on logical operators is meaningless.
> > Calcite
> > > >> is mixing the concept of logical and physical together, which is
> sad.
> > > >>
> > > >> BTW, using 2 conventions is not appropriate and wrong.
> > > >>
> > > >> - Haisheng
> > > >>
> > > >> ------------------------------------------------------------------
> > > >> 发件人:Vladimir Ozerov<ppoze...@gmail.com>
> > > >> 日 期:2019年11月05日 18:02:15
> > > >> 收件人:Haisheng Yuan<h.y...@alibaba-inc.com>
> > > >> 抄 送:dev@calcite.apache.org (dev@calcite.apache.org)<
> > > >> dev@calcite.apache.org>
> > > >> 主 题:Re: Re: Problem with converters and possibly rule matching
> > > >>
> > > >> Hi Haisheng,
> > > >>
> > > >>  think I already tried something very similar to what you explained,
> > but
> > > >> it gave not an optimal plan. Please let me describe what I did. I
> > would
> > > >> appreciate your feedback.
> > > >>
> > > >> 1) We start with a simple operator tree Root <- Project <- Scan,
> where
> > > >> the root is a final aggregator in the distributed query engine:
> > > >> -> LogicalRoot
> > > >>  -> LogicalProject
> > > >>   -> LogicalScan
> > > >>
> > > >> 2) First, we convert the Root and enforce SINGLETON distribution on
> a
> > > >> child:
> > > >> *-> PhysicalRoot[SINGLETON]*
> > > >> * -> Enforcer#1[SINGLETON]*
> > > >>   -> LogicalProject
> > > >>    -> LogicalScan
> > > >>
> > > >> 3) Then the project's rule is invoked. It doesn't know the
> > distribution
> > > >> of the input, so it requests ANY distribution. Note that we have to
> > set
> > > ANY
> > > >> to the project as well since we do not know the distribution of the
> > > input:
> > > >> -> PhysicalRoot[SINGLETON]
> > > >>  -> Enforcer#1[SINGLETON]
> > > >> *  -> PhysicalProject[ANY]*
> > > >> *   -> Enforcer#2[ANY]*
> > > >>     -> LogicalScan
> > > >>
> > > >> 4) Finally, the physical scan is created and its distribution is
> > > >> resolved. Suppose that it is REPLICATED, i.e. the whole result set
> is
> > > >> located on all nodes.
> > > >> -> PhysicalRoot[SINGLETON]
> > > >>  -> Enforcer#1[SINGLETON]
> > > >>   -> PhysicalProject[ANY]
> > > >>    -> Enforcer#2[ANY]
> > > >> *    -> PhysicalScan[REPLICATED]*
> > > >>
> > > >> 5) Now as all logical nodes are converted, we start resolving
> > enforcers.
> > > >> The second one is no-op, since REPLICATED satisfies ANY:
> > > >> -> PhysicalRoot[SINGLETON]
> > > >>  -> Enforcer#1[SINGLETON]
> > > >>   -> PhysicalProject[ANY]
> > > >>    -> PhysicalScan[REPLICATED]
> > > >>
> > > >> 6) But the first enforcer now requires an Exchange, since ANY
> doesn't
> > > >> satisfy SINGLETON!
> > > >> -> PhysicalRoot[SINGLETON]
> > > >> * -> SingletonExchange[SINGLETON]*
> > > >>   -> PhysicalProject[ANY] // <= unresolved!
> > > >>    -> PhysicalScan[REPLICATED]
> > > >>
> > > >> The resulting plan requires data movement only because we didn't
> know
> > > >> precise distribution of the PhysicalProject when it was created. But
> > > should
> > > >> I enable Convention.Impl.canConvertConvention, bottom-up propagation
> > > >> kicks in, and the correct plan is produced because now
> LogicalProject
> > > >> has a chance to be converted to PhysicalProject with the concrete
> > > >> distribution. The optimized plan looks like this (since REPLICATED
> > > >> satisfies SINGLETON):
> > > >> -> PhysicalRoot[SINGLETON]
> > > >>  -> PhysicalProject[REPLICATED]
> > > >>   -> PhysicalScan[REPLICATED]
> > > >>
> > > >> You may see this in action in my reproducer:
> > > >> 1) Test producing "bad" plan:
> > > >>
> > >
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/test/java/devozerov/OptimizerTest.java#L45
> > > >> 2) Root enforces SINGLETON on Project:
> > > >>
> > >
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/RootPhysicalRule.java#L45
> > > >> 3) Project enforces default (ANY) distribution on Scan:
> > > >>
> > >
> >
> https://github.com/devozerov/calcite-optimizer/blob/master/src/main/java/devozerov/physical/ProjectPhysicalRule.java#L49
> > > >>
> > > >> Please let me know if this flow is similar to what you meant.
> > > >>
> > > >> Regards,
> > > >> Vladimir.
> > > >>
> > > >> пн, 4 нояб. 2019 г. в 10:33, Haisheng Yuan <h.y...@alibaba-inc.com
> >:
> > > >>
> > > >>> Hi Vladimir,
> > > >>>
> > > >>> This is still can be done through top-down request approach.
> > > >>>
> > > >>> PhysicalFilter operator should request ANY distribution from child
> > > >>> operator, unless there is outer reference in the filter condition,
> in
> > > which
> > > >>> case, PhysicalFilter should request SINGLETON or BROADCAST
> > > distribution. So
> > > >>> in your case, PhysicalFilter request ANY, its required distribution
> > > will be
> > > >>> enforced on filter's output.
> > > >>>
> > > >>> Regarding index usage, you should have a FIlterTableScan2IndexGet
> > > >>> logical transformation rule, and a IndexGet2IndexScan physical
> > > >>> implementation rule. Note that IndexGet is a logical operator and
> > > IndexScan
> > > >>> is a physical operator, which are also used by SQL Server.
> > > >>>
> > > >>> - Haisheng
> > > >>>
> > > >>> ------------------------------------------------------------------
> > > >>> 发件人:Vladimir Ozerov<ppoze...@gmail.com>
> > > >>> 日 期:2019年11月01日 17:30:26
> > > >>> 收件人:<dev@calcite.apache.org>
> > > >>> 主 题:Re: Problem with converters and possibly rule matching
> > > >>>
> > > >>> Hi Stamatis,
> > > >>>
> > > >>> Thank you for your reply. I also thought that we may set the
> > > distribution
> > > >>> trait during logical planning because it is known in advance. And
> the
> > > >>> example I gave will work! :-) But unfortunately, it will work only
> > > >>> because
> > > >>> the tree is very simple, and the Project is adjacent to the Scan.
> > This
> > > is
> > > >>> how my reproducer will work in that case:
> > > >>> 1) Root: enforce "SINGLETON" on Project
> > > >>> 2) Project: check the logical Scan, infer already resolved
> > > distribution,
> > > >>> then convert to [PhyiscalProject <- PhysicalScan]
> > > >>> 3) Resolve Root enforcer, adding and Exchange if needed.
> > > >>>
> > > >>> But this stops working as soon as a plan becomes more complex so
> that
> > > it
> > > >>> is
> > > >>> impossible to infer the distribution from the child immediately.
> > E.g.:
> > > >>> LogicalRoot [distribution=SINGLETON]
> > > >>> -> LogicalProject // We are here new and cannot produce the
> physical
> > > >>> project
> > > >>> -> LogicalFilter[distribution=?]
> > > >>> -> LogicalScan[distribution=REPLICATED]
> > > >>>
> > > >>> This is where your suggestion with cascading enforcement may kick
> in.
> > > But
> > > >>> now consider that instead of having REPLICATED distribution, which
> > > >>> satisfies SINGLETON, we have a PARTITIONED distribution, It doesn't
> > > >>> satisfy
> > > >>> SINGLETON, so we have to insert the exchange.
> > > >>>
> > > >>> Before:
> > > >>> PhysicalRoot [SINGLETON]
> > > >>> -> PhysicalProject [SINGLETON]
> > > >>> -> PhysicalFilter [SINGLETON]
> > > >>> -> LogicalScan [PARTITIONED] // SINGLETON is enforced here
> > > >>>
> > > >>> After:
> > > >>> PhysicalRoot [SINGLETON]
> > > >>> -> PhysicalProject [SINGLETON]
> > > >>> -> PhysicalFilter [SINGLETON]
> > > >>> -> SingletonExchange [SINGLETON]
> > > >>> -> PhysicalScan [PARTITIONED]
> > > >>>
> > > >>> But unfortunately, this plan is not optimal. Since we perform
> Project
> > > and
> > > >>> Filter after the exchange, we now have to reshuffle the whole
> table.
> > > The
> > > >>> optimal plan would be to pushdown Project and Filter past exchange:
> > > >>> PhysicalRoot [SINGLETON]
> > > >>> -> SingletonExchange [SINGLETON]
> > > >>> -> PhysicalProject [PARTITIONED]
> > > >>> -> PhysicalFilter [PARTITIONED]
> > > >>> -> PhysicalScan [PARTITIONED]
> > > >>>
> > > >>> But in order to achieve that, now I need to introduce new rules
> which
> > > >>> will
> > > >>> attempt to transpose dozens of different operators past exchange,
> so
> > > most
> > > >>> likely the planning will never finish :-)
> > > >>>
> > > >>> This is why it seems that the bottom-up approach seems to be more
> > > natural
> > > >>> for such cases. Another example is index support. A table may have
> a
> > > >>> number
> > > >>> of access methods in addition to plain scan, and every such method
> > may
> > > >>> produce a physical scan with different collations. It is not
> > practical
> > > to
> > > >>> try to enforce something from the top. Instead, we'd better produce
> > > >>> alternatives from the bottom.
> > > >>>
> > > >>> Basically, Drill tries to mix two approaches. First, it tries to
> > derive
> > > >>> traits from the child. If it fails and no transformations were
> > created,
> > > >>> it
> > > >>> just stops trait propagation. As a result, an exchange is created
> on
> > > top
> > > >>> of
> > > >>> the operator where we stopped, which again leads to not optimal
> > plans.
> > > >>> You
> > > >>> may observe this in action if you run my reproducer. "testPass"
> > > produces
> > > >>> an
> > > >>> optimal plan with help of abstract converters. "testDrill" produces
> > not
> > > >>> optimal plan with the unnecessary exchange.
> > > >>>
> > > >>> All in all, I cannot get how we can employ distribution metadata
> and
> > > >>> index
> > > >>> metadata for optimal planning without using a pure bottom-up
> > approach.
> > > >>>
> > > >>> Regards,
> > > >>> Vladimir.
> > > >>>
> > > >>> пт, 1 нояб. 2019 г. в 11:42, Stamatis Zampetakis <
> zabe...@gmail.com
> > >:
> > > >>>
> > > >>> > The discussion is interesting thanks for the nice examples.
> > > >>> > I am replying in this thread since it has all the examples and
> the
> > > >>> history
> > > >>> > of the conversation.
> > > >>> >
> > > >>> > *Part I: Distribution physical or logical property*
> > > >>> >
> > > >>> > The Volcano paper writes the following regarding properties:
> > > >>> >
> > > >>> > "Logical properties can be derived from the logical algebra
> > > expression
> > > >>> and
> > > >>> > include schema, expected size, etc., while physical properties
> > depend
> > > >>> on
> > > >>> > algorithms, e.g., sort order, partitioning, etc."
> > > >>> >
> > > >>> > We could say that partitioning is not only a property which
> depends
> > > on
> > > >>> an
> > > >>> > algorithm but a property which depends on the schema. When you
> > have a
> > > >>> table
> > > >>> > you know in advance that the particular table is replicated.
> > > >>> >
> > > >>> > Basically, this means that the starting "logical" plan in your
> case
> > > >>> could
> > > >>> > be the following:
> > > >>> > RootLogicalRel[convention=LOGICAL, distribution=ANY]
> > > >>> > -> ProjectLogicalRel[convention=LOGICAL, distribution=ANY]
> > > >>> > -> MapScanLogicalRel[convention=LOGICAL, distribution=REPLICATED]
> > > >>> >
> > > >>> > When you are about to create your physical projection it is not
> > > >>> necessary
> > > >>> > to create three equivalent paths but the most promising by asking
> > the
> > > >>> input
> > > >>> > what can it offer as traits. Again this partially overlaps with
> the
> > > >>> > discussion about "On demand traitset" [1] where as I wrote there
> is
> > > >>> already
> > > >>> > a mechanism to derive traits from children (collation,
> > distribution,
> > > >>> etc.).
> > > >>> >
> > > >>> > Note that I am not saying that there is nothing more to be done
> to
> > > >>> improve
> > > >>> > the Calcite optimizer; I am just trying to provide a viable
> > > alternative
> > > >>> > given the current situation of the project.
> > > >>> >
> > > >>> > *Part II: Example with the original Volcano optimizer*
> > > >>> >
> > > >>> > Now let me try to show how the original (from the paper) Volcano
> > > >>> optimizer
> > > >>> > could handle your query.
> > > >>> >
> > > >>> > The optimization starts and we request two properties convention
> > and
> > > >>> > distribution.
> > > >>> >
> > > >>> > RootLogicalRel[convention=PHYSICAL, distribution=SINGLETON]
> > > >>> > -> ProjectLogicalRel
> > > >>> > -> MapScanLogicalRel
> > > >>> >
> > > >>> > Rule 1: RootLogicalRel -> RootPhysicalRel applies the algorithm
> but
> > > >>> cannot
> > > >>> > satisfy the SINGLETON property so it propagates the demand
> > > >>> >
> > > >>> > RootPhysicalRel
> > > >>> > -> ProjectLogicalRel [convention=PHYSICAL,
> distribution=SINGLETON]
> > > >>> > -> MapScanLogicalRel
> > > >>> >
> > > >>> > Rule 2: ProjectLogicalRel -> ProjectPhysicalRel applies the
> > algorithm
> > > >>> but
> > > >>> > cannot satisfy the SINGLETON property so it propagates the demand
> > > >>> >
> > > >>> > RootPhysicalRel
> > > >>> > -> ProjectPhysicalRel
> > > >>> > -> MapScanLogicalRel [convention=PHYSICAL,
> distribution=SINGLETON]
> > > >>> >
> > > >>> > Rule 3: MapScanLogicalRel -> MapScanPhysicalRel applies the
> > algorithm
> > > >>> and
> > > >>> > given that the table is replicated it satisfies the distribution
> > > >>> property.
> > > >>> >
> > > >>> > RootPhysicalRel
> > > >>> > -> ProjectPhysicalRel
> > > >>> > -> MapScanPhysicalRel
> > > >>> >
> > > >>> > So we end up with a complete plan that satisfies all properties
> and
> > > >>> does
> > > >>> > not have redundant exchange operators.
> > > >>> > Moreover we didn't require all possible distributions when we
> > applied
> > > >>> Rule
> > > >>> > 2 since we just want to satisfy the SINGLETON property requested
> by
> > > the
> > > >>> > parent.
> > > >>> > The example above does not demonstrate the additional
> optimization
> > > >>> paths
> > > >>> > which would be apply an enforcer (an Exchange operator to satisfy
> > the
> > > >>> > SINGLETON property) at a higher level than the scan.
> > > >>> >
> > > >>> > Would this be an acceptable approach? Does it still suffer from a
> > big
> > > >>> > search space?
> > > >>> >
> > > >>> > To make the analog with the actual Volcano optimizer in Calcite
> the
> > > >>> thing
> > > >>> > that may be missing is to be able to know what trait was
> requested
> > by
> > > >>> the
> > > >>> > parent during the application of Rule 2.
> > > >>> >
> > > >>> > Best,
> > > >>> > Stamatis
> > > >>> >
> > > >>> > [1]
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> https://lists.apache.org/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> > > >>> >
> > > >>> > On Wed, Oct 30, 2019 at 7:24 PM Vladimir Ozerov <
> > ppoze...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Hi Stamatis,
> > > >>> > >
> > > >>> > > The problem that the presented reproducer is a very simplified
> > > >>> version of
> > > >>> > > what is actually needed. In reality, there are several
> > distribution
> > > >>> > types -
> > > >>> > > PARTITIONED, REPLICATED, SINGLETON. To make things worse,
> > > >>> PARTITIONED may
> > > >>> > > or may not have concrete distribution fields. In theory, I can
> > > >>> create one
> > > >>> > > transformation per distribution type, but that would increase
> > plan
> > > >>> space
> > > >>> > > significantly. In my sample "Root <- Project <- Scan" plan,
> there
> > > is
> > > >>> no
> > > >>> > > room for optimization at all, we only need to convert the nodes
> > and
> > > >>> > > propagate traits. But if I follow the proposed approach, the
> > > planner
> > > >>> > would
> > > >>> > > create three equivalent paths. For complex queries, this may
> > > increase
> > > >>> > > optimization time significantly.
> > > >>> > >
> > > >>> > > What I need instead is to gradually convert and optimize nodes
> > > >>> > *bottom-up*,
> > > >>> > > instead of top-bottom. That is, create a physical scan first,
> > then
> > > >>> > create a
> > > >>> > > physical project on top of it, etc. But at the same time, some
> > > rules
> > > >>> > still
> > > >>> > > require the top-bottom approach. So essentially I need the
> > > optimizer
> > > >>> to
> > > >>> > do
> > > >>> > > both. Abstract converters help me establish bottom-up
> preparation
> > > >>> but do
> > > >>> > > this at the cost of considering too many trait pairs, and as a
> > > >>> result,
> > > >>> > also
> > > >>> > > pollute the search space.
> > > >>> > >
> > > >>> > > To the contrast, precise command "I transformed the node A to
> > node
> > > B,
> > > >>> > > please re-trigger the rules for A's parents" would allow us to
> > > >>> re-trigger
> > > >>> > > only required rules, without adding more nodes.
> > > >>> > >
> > > >>> > > Does it make sense?
> > > >>> > >
> > > >>> > > Regards,
> > > >>> > > Vladimir.
> > > >>> > >
> > > >>> > > ср, 30 окт. 2019 г. в 21:02, Stamatis Zampetakis <
> > > zabe...@gmail.com
> > > >>> >:
> > > >>> > >
> > > >>> > > > I admit that I didn't thoroughly read the exchanges so far
> but
> > > >>> forcing
> > > >>> > > the
> > > >>> > > > rules trigger does not seem the right approach.
> > > >>> > > >
> > > >>> > > > Other than that I would like to backtrack a bit to point 4.3
> > > >>> raised by
> > > >>> > > > Vladimir.
> > > >>> > > >
> > > >>> > > > "ProjectPhysicalRule [6] - transforms logical project to
> > physical
> > > >>> > > > project *ONLY* if there is an underlying physical input with
> > > >>> REPLICATED
> > > >>> > > or
> > > >>> > > > SINGLETON distribution"
> > > >>> > > >
> > > >>> > > > The rule could be modified to do the following two
> > > transformations:
> > > >>> > > > 1. Create a physical project and require the input to be
> > > >>> REPLICATED.
> > > >>> > > > 2. Create a physical project and require
> > > >>> > > > the input to be SINGLETON.
> > > >>> > > >
> > > >>> > > > I would assume that afterwards when your scan rule fires it
> > > should
> > > >>> go
> > > >>> > to
> > > >>> > > > the appropriate subset and give you back the desired plan. Is
> > > >>> there a
> > > >>> > > > problem with this approach?
> > > >>> > > >
> > > >>> > > > Best,
> > > >>> > > > Stamatis
> > > >>> > > >
> > > >>> > > > On Wed, Oct 30, 2019, 5:52 PM Seliverstov Igor <
> > > >>> gvvinbl...@gmail.com>
> > > >>> > > > wrote:
> > > >>> > > >
> > > >>> > > > > Unfortunately it requires package-private API usage.
> > > >>> > > > >
> > > >>> > > > > Best solution there if Calcite provide it for end users.
> > > >>> > > > >
> > > >>> > > > > ср, 30 окт. 2019 г., 18:48 Vladimir Ozerov <
> > ppoze...@gmail.com
> > > >:
> > > >>> > > > >
> > > >>> > > > > > “e pension” == “expand conversion” :)
> > > >>> > > > > >
> > > >>> > > > > > ср, 30 окт. 2019 г. в 18:46, Vladimir Ozerov <
> > > >>> ppoze...@gmail.com>:
> > > >>> > > > > >
> > > >>> > > > > > > Yes, that may work. Even if e pension rule is used, for
> > the
> > > >>> most
> > > >>> > > > cases
> > > >>> > > > > it
> > > >>> > > > > > > will not trigger any real conversions, since we are
> > moving
> > > >>> from
> > > >>> > > > > abstract
> > > >>> > > > > > > convention to physical, and created converters will
> have
> > > the
> > > >>> > > opposite
> > > >>> > > > > > trait
> > > >>> > > > > > > direction (from physical to abstract).
> > > >>> > > > > > >
> > > >>> > > > > > > But again - ideally we only need to re-trigger the
> rules
> > > for
> > > >>> a
> > > >>> > > > specific
> > > >>> > > > > > > node, no more than that. So API support like
> > > >>> > > > > > > “VolcanoPlanner.forceRules(RelNode)” would be very
> > > >>> convenient.
> > > >>> > > > > > >
> > > >>> > > > > > > What do you think?
> > > >>> > > > > > >
> > > >>> > > > > > > ср, 30 окт. 2019 г. в 17:56, Seliverstov Igor <
> > > >>> > > gvvinbl...@gmail.com
> > > >>> > > > >:
> > > >>> > > > > > >
> > > >>> > > > > > >> I considered manual rules calling too, for now we use
> > > >>> abstract
> > > >>> > > > > > converters
> > > >>> > > > > > >> +
> > > >>> > > > > > >> ExpandConversionRule for exchanges producing.
> > > >>> > > > > > >>
> > > >>> > > > > > >> You may create such converters manually (checking
> > > >>> appropriate
> > > >>> > > > subset)
> > > >>> > > > > > this
> > > >>> > > > > > >> case you may reduce created converters count, also, a
> > > >>> converter
> > > >>> > > is a
> > > >>> > > > > > quite
> > > >>> > > > > > >> special node, that does almost nothing (without
> > > >>> corresponding
> > > >>> > > rule)
> > > >>> > > > it
> > > >>> > > > > > may
> > > >>> > > > > > >> be used just as a rule trigger.
> > > >>> > > > > > >>
> > > >>> > > > > > >> Regards,
> > > >>> > > > > > >> Igor
> > > >>> > > > > > >>
> > > >>> > > > > > >> ср, 30 окт. 2019 г., 17:31 Vladimir Ozerov <
> > > >>> ppoze...@gmail.com
> > > >>> > >:
> > > >>> > > > > > >>
> > > >>> > > > > > >> > One funny hack which helped me is manual
> registration
> > > of a
> > > >>> > fake
> > > >>> > > > > > RelNode
> > > >>> > > > > > >> > with desired traits through
> VolcanoPlanner.register()
> > > >>> method.
> > > >>> > > But
> > > >>> > > > > > again,
> > > >>> > > > > > >> > this leads to trashing. What could really help is a
> > call
> > > >>> to
> > > >>> > > > > > >> > VolcanoPlanner.fireRules() with desired rel. But
> this
> > > >>> doesn't
> > > >>> > > work
> > > >>> > > > > out
> > > >>> > > > > > >> of
> > > >>> > > > > > >> > the box since some internals of the rule queue needs
> > to
> > > be
> > > >>> > > > adjusted.
> > > >>> > > > > > >> >
> > > >>> > > > > > >> > What does the community think about adding a method
> > > which
> > > >>> will
> > > >>> > > > > re-add
> > > >>> > > > > > >> rules
> > > >>> > > > > > >> > applicable to the specific RelNode to the rule
> queue?
> > > >>> > > > > > >> >
> > > >>> > > > > > >> > ср, 30 окт. 2019 г. в 17:00, Vladimir Ozerov <
> > > >>> > > ppoze...@gmail.com
> > > >>> > > > >:
> > > >>> > > > > > >> >
> > > >>> > > > > > >> > > Hi Igor,
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > Yes, I came to the same conclusion, thank you.
> This
> > is
> > > >>> how
> > > >>> > it
> > > >>> > > > > > >> basically
> > > >>> > > > > > >> > > happens when converters are disabled:
> > > >>> > > > > > >> > > 1) We start with initial tree: [LogicalProject] <-
> > > >>> > > [LogicalScan]
> > > >>> > > > > > >> > > 2) Then we convert LogicalScan to PhysicalScan, so
> > it
> > > is
> > > >>> > added
> > > >>> > > > to
> > > >>> > > > > > the
> > > >>> > > > > > >> > > set: [LogicalProject] <- [LogicalScan,
> PhysicalScan]
> > > >>> > > > > > >> > > 3) Finally, when it is time to fire a rule for
> > > >>> PhysicalScan,
> > > >>> > > we
> > > >>> > > > > try
> > > >>> > > > > > to
> > > >>> > > > > > >> > get
> > > >>> > > > > > >> > > parents of that scan set with traits of the
> > > >>> PhysicalScan.
> > > >>> > > Since
> > > >>> > > > > > there
> > > >>> > > > > > >> are
> > > >>> > > > > > >> > > no such parents (we skipped it intentionally), the
> > > rule
> > > >>> is
> > > >>> > not
> > > >>> > > > > > queued.
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > But when converters are enabled, a converter rel
> is
> > > >>> created:
> > > >>> > > > > > >> > [LogicalProject]
> > > >>> > > > > > >> > > <- [LogicalScan, PhysicalScan,
> > > >>> > > ConverterFromPhysicalToLogical].
> > > >>> > > > No
> > > >>> > > > > > >> rules
> > > >>> > > > > > >> > > are fired for PhysicalScan again, but they are
> fired
> > > for
> > > >>> > > > converter
> > > >>> > > > > > >> since
> > > >>> > > > > > >> > > it has the necessary LOGICAL trait.
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > It makes sense, that converters essentially allow
> > > >>> forcing
> > > >>> > rule
> > > >>> > > > > > >> invocation
> > > >>> > > > > > >> > > on parents, even if the child was created with
> > > different
> > > >>> > > traits.
> > > >>> > > > > But
> > > >>> > > > > > >> it
> > > >>> > > > > > >> > > seems that this mechanism may be too heavy for
> > complex
> > > >>> > queries
> > > >>> > > > > > >> because it
> > > >>> > > > > > >> > > literally creates hundreds of new converter rels
> and
> > > >>> > triggers
> > > >>> > > > > rules
> > > >>> > > > > > >> over
> > > >>> > > > > > >> > > and over again.
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > We need some fine-grained alternative. Basically,
> > what
> > > >>> would
> > > >>> > > be
> > > >>> > > > > > enough
> > > >>> > > > > > >> > for
> > > >>> > > > > > >> > > me is to let the planner know somehow: "I created
> > that
> > > >>> rel,
> > > >>> > > and
> > > >>> > > > I
> > > >>> > > > > > want
> > > >>> > > > > > >> > you
> > > >>> > > > > > >> > > to execute parent rules not only using its trait
> but
> > > >>> also on
> > > >>> > > > this
> > > >>> > > > > > and
> > > >>> > > > > > >> > those
> > > >>> > > > > > >> > > traits."
> > > >>> > > > > > >> > > Is there any API in Calcite which allows doing
> this
> > > >>> without
> > > >>> > > > > > creating a
> > > >>> > > > > > >> > new
> > > >>> > > > > > >> > > rel node?
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > Regards,
> > > >>> > > > > > >> > > Vladimir.
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > > ср, 30 окт. 2019 г. в 09:25, Seliverstov Igor <
> > > >>> > > > > gvvinbl...@gmail.com
> > > >>> > > > > > >:
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> > >> Vladimir,
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >> Probably it'll help you:
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >> Seems the cause of issue in
> > RelSubset.getParentRels()
> > > >>> The
> > > >>> > > > check
> > > >>> > > > > > used
> > > >>> > > > > > >> > when
> > > >>> > > > > > >> > >> the planner schedules newly matched rules after
> > > >>> successful
> > > >>> > > > > > >> > transformation
> > > >>> > > > > > >> > >> (see VolcanoRuleCall.matchRecurse), it prevents
> the
> > > >>> parent
> > > >>> > > rule
> > > >>> > > > > be
> > > >>> > > > > > >> > applied
> > > >>> > > > > > >> > >> once again (here your logical project with an
> input
> > > >>> having
> > > >>> > > ANY
> > > >>> > > > > > >> > >> distribution
> > > >>> > > > > > >> > >> doesn't satisfy a transformed input traits).
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >> In our case we use another workaround, so there
> are
> > > >>> also
> > > >>> > much
> > > >>> > > > > more
> > > >>> > > > > > >> > >> transformations than we wanted, so the desired
> rule
> > > is
> > > >>> > > > triggered.
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >> вт, 29 окт. 2019 г., 14:46 Vladimir Ozerov <
> > > >>> > > ppoze...@gmail.com
> > > >>> > > > >:
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >> > Hi Vladimir,
> > > >>> > > > > > >> > >> >
> > > >>> > > > > > >> > >> > I am sorry. Pushed, it works now.
> > > >>> > > > > > >> > >> >
> > > >>> > > > > > >> > >> > вт, 29 окт. 2019 г. в 14:41, Vladimir Sitnikov
> <
> > > >>> > > > > > >> > >> > sitnikov.vladi...@gmail.com
> > > >>> > > > > > >> > >> > >:
> > > >>> > > > > > >> > >> >
> > > >>> > > > > > >> > >> > > > mvn clean test
> > > >>> > > > > > >> > >> > >
> > > >>> > > > > > >> > >> > > [ERROR] The goal you specified requires a
> > project
> > > >>> to
> > > >>> > > > execute
> > > >>> > > > > > but
> > > >>> > > > > > >> > >> there is
> > > >>> > > > > > >> > >> > > no POM in this directory
> > > >>> > > > > > >> > >> > >
> > > >>> > > > > > >> > >> > > Vladimir, please push missing files
> > > >>> > > > > > >> > >> > >
> > > >>> > > > > > >> > >> > > Vladimir
> > > >>> > > > > > >> > >> > >
> > > >>> > > > > > >> > >> >
> > > >>> > > > > > >> > >>
> > > >>> > > > > > >> > >
> > > >>> > > > > > >> >
> > > >>> > > > > > >>
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

Reply via email to