Jinfeng Ni's  proposal +1, enhance method satisfies maybe more reasonable.


Danny Chan's thoughts +1

Regards!

Aron Tao


XING JIN <jinxing.co...@gmail.com> 于2019年11月8日周五 下午8:09写道:

> Hi Haisheng,
>
> Thanks a lot for sharing this great proposal ~
> For short I understand your idea as below:
> 1. Derive the distributions/collations that children COULD/MIGHT offer
> 2. Decide the best distributions/collations by first point and computing
> logic of operator, say gropuings in Aggregate;
>
> It comes to me that another important part is that children operator should
> also provide the corresponding COST for the POSSIBLE
> distribution/collation.
> The COST is not for the final plan, but a hypothesis.
>
> Take below example
> SELECT DISTINCT c, b FROM
>   ( SELECT R.c c, S.b b FROM R, S
>         WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> Aggregate
> +--- InnerJoin
>      |--- TableScan on R
>      +--- TableScan on S
>
> InnerJoin should deliver that its possible collations and corresponding
> costs at the same time.
> - If ordered by (c, b, a) my cost is ...
> - If ordered by (b, c, a) my cost is ...
> - If ordered by (a, b, c) my cost is ...
> By which Aggregate decide the 'best' required collation.
> By this way we can better limit the searching space and also target the
> relatively optimized (if not best) plan.
>
> Also when you say "I didn't say adding to RelNode, but a new API/interface
> for physical operator only.", I'm not so clear;
> Currently the physical operators in Calcite like EnumerableHashJoin,
> EnumerableMergeJoin, when created, their physical behavior(like real
> collations) are determined.
> So I belive you intend to add new API at upper layer, but there's no
> physical optimizing phase in Calcite at this moment. Where do you want to
> add the new API, can you specify ?
>
> Thanks,
> Jin
>
> Jinfeng Ni <j...@apache.org> 于2019年11月6日周三 上午1:56写道:
>
> > @Haisheng, @Xiening,
> >
> > Thanks for pointing that previous email out.  Overall, I agree that
> > the physical trait enforcement should be done in the engine, not in
> > the rule. For the rule, it should only specify the request, and the
> > corresponding transformation, and let the engine to explore the search
> > space. It will be great if we can revamp the Volcano optimizer
> > framework, to do that way.
> >
> > In terms of search space, it's always a tradeoff between the space
> > searched and the optimality of the plan found. I think it's fine for
> > the engine to explore a potential big search space, as long as it has
> > effective "bound-and-prune" strategy. In the original Volcano paper,
> > there is a way to prune the search space based on the best plan found
> > so far, using the parameter "limit".  When an implementable plan is
> > found, a "real" cost is obtained, which could be used to prune
> > un-necessary search space.  That's actually the advantage of Volcano's
> > "top-down" approach. However,  seems to me that Calcite's Volcano did
> > not apply that approach effectively, because of the existence of
> > AbstractConverter.
> >
> >
> > On Sun, Nov 3, 2019 at 10:12 PM Haisheng Yuan <h.y...@alibaba-inc.com>
> > wrote:
> > >
> > > Hi Jinfeng,
> > >
> > > I think you might have missed the email about proposed API for physical
> > operators I sent out previously in [1].
> > >
> > > We don't need request all the permutation, which is also impossible in
> > practice, the search space is going to explode.
> > >
> > > In the example in email [1], I already talked about your concen on
> > passing down parent request into children may lead to less optimal plan.
> > Besically join operator can send 2 collation optimization requests, one
> is
> > to pass request down, the other one is ignore the parent's request.
> > >
> > > Using AbstractConverter to enforce properties is inapporpriate, which
> > handles all the optimization work to physical operator providers, meaning
> > there is almost no physical level optimization mechanism in Calcite. SQL
> > Server and Greenplum's optimizer, which are Cascades framework based,
> > implemented the property enforcement in the core optimizer engine, not
> > through AbstractConverter and rules, physical operators just need to
> > implement those methods (or similar) I mentioned in email [1]. My goal is
> > completely abolishing AbstractConverter.
> > >
> > > [1]
> >
> http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e
> > >
> > > - Haisheng
> > >
> > > ------------------------------------------------------------------
> > > 发件人:Jinfeng Ni<j...@apache.org>
> > > 日 期:2019年11月01日 14:10:30
> > > 收件人:<dev@calcite.apache.org>
> > > 主 题:Re: [DISCUSS] On-demand traitset request
> > >
> > > Hi Xiening,
> > >
> > > "Let say if R and S doesn’t have sorting properties at all. In your
> > > case, we would end up adding enforcers for LHS and RHS to get
> > > collation (a, b, c). Then we would need another enforcer to get
> > > collation (b, c). This is a sub optimal plan as we could have use (b,
> > > c, a) for join."
> > >
> > > In such case, for step 2 when MergeJoin request a permutation match of
> > > (a, b,c) on both it's input, it is not necessary to end up with
> > > collation (a, b, c) only. Since it request "permutation", MJ could ask
> > > all possible satisfying collations, which include (b, c, a). In other
> > > words, the steps I described did not exclude such plan.
> > >
> > > You may argue it would increase the search space. However, by
> > > limiting the search space, without explore all possible choice, we may
> > > lose the chance getting 'optimal' plan we want. For instance, in the
> > > above example, the idea of passing "on demand" trait request (b,c)
> > > from Agg to MJ is to avoid unnecessary sort (b,c). In cases where the
> > > join condition has good filtering, and such sort of join output could
> > > be quite cheap. Yet in the plan enumeration, since we use "on demand"
> > > trait request from parent to guide the actions of MJ, I'm not sure if
> > > we may restrict the choices we consider in the legs of join, whose
> > > cardinality could be larger and play a bigger role in the overall
> > > cost.
> > >
> > > In other words, by using "on demand" trait request, we may restrict
> > > the choices of plan, possibly in the some operators with larger data
> > > size.
> > >
> > > In the current implementation of VolcanoPlanner, I feel the root issue
> > > of long planning time is not to explore all possible satisfying trait.
> > > It is actually the unnecessary of AbstractConverter, added to the
> > > equivalence class.
> > >
> > >
> > > On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai <xndai....@gmail.com>
> > wrote:
> > > >
> > > > Thanks for the sharing. I like the way you model this problem,
> Jinfeng.
> > > >
> > > > There’s one minor issue with your example. Let say if R and S doesn’t
> > have sorting properties at all. In your case, we would end up adding
> > enforcers for LHS and RHS to get collation (a, b, c). Then we would need
> > another enforcer to get collation (b, c). This is a sub optimal plan as
> we
> > could have use (b, c, a) for join.
> > > >
> > > > I think in step #2, the join operator would need to take the agg
> trait
> > requirement into account. Then it would have two options -
> > > >
> > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to
> > guarantee the join output would deliver the collation agg needs.
> > > > 2) require permutation match of (a, b, c); in such case, an enforcer
> > might be needed for aggregation.
> > > >
> > > > Eventually the cost model decides who is the winner.
> > > >
> > > > There’s a fundamental difference between your model and Haisheng’s
> > proposal. In Haisheng’s case, a rel node not only looks at its parent’s
> > requirement, but also tries to get the potential traits its input could
> > deliver. It would try to align them to eliminate unnecessary
> alternatives.
> > > >
> > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to
> > implement option 1), we would generate two alternatives -
> > > >
> > > > MergeJoin (b, c, a)
> > > > TableScan R
> > > > Sort(b, c, a)
> > > > TableScan S
> > > >
> > > > MergeJoin(c, b, a)
> > > > Sort(c, b, a)
> > > > TableScan R
> > > > Sort(c, b, a)
> > > > TableScan S
> > > >
> > > > But if we look at the input traits and has the insight that R already
> > delivers (b, c, a), we could decide to require (b, c, a) only and avoid
> > generating the 2nd plan, which is definitely worse, and reduce the search
> > space.
> > > >
> > > >
> > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <j...@apache.org> wrote:
> > > > >
> > > > > A little bit of history. In Drill, when we first implemented
> > > > > Distribution trait's definition, we allows both exact match and
> > > > > partial match in satisfy() method. This works fine for single-input
> > > > > operator such aggregation, however it leads to incorrect plan for
> > join
> > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that
> > > > > time, we removed partial match, and use exact match only. Yet this
> > > > > changes leads to unnecessary additional exchange. To mitigate this
> > > > > problem, in join physical operator, for a join key (a, b, c), we
> > > > > enumerate different distribution requests, yet this lead to more
> > space
> > > > > to explore and significantly increase planning time (which is
> > probably
> > > > > what Haisheng also experienced). When I look back, I feel probably
> > > > > what we miss is the "coordination" step in the join operator,
> because
> > > > > if we relax the requirement of satisfy(), for multi-input
> operators,
> > > > > we have to enforce some "coordination", to make sure multiple
> input's
> > > > > trait could work together properly.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <j...@apache.org> wrote:
> > > > >>
> > > > >> This is an interesting topic. Thanks for bringing up this issue.
> > > > >>
> > > > >> My understanding of Volcano planner is it works in a top-down
> search
> > > > >> mode (the parent asks for certain trait of its child), while the
> > trait
> > > > >> propagates in a bottom-up way, as Stamatis explained.
> > > > >>
> > > > >> IMHO, the issue comes down to the definition of RelTrait, how to
> > > > >> determine if a trait A could satisfy a request asking for trait B,
> > > > >> that is, how RelTrait.satisfies() method is implemented.
> > > > >>
> > > > >> Let's first clarify different situations, using collation as
> > example.
> > > > >> 1) The collation is requested by query's outmost ORDER BY clause.
> > > > >> - The generated plan has to have "exact match", i.e same collation
> > > > >> (same column sequence), or "super match" .
> > > > >> exact match: (a, b) satisfy (a, b)
> > > > >> super match: (a, b, c) satisfy (a, b)
> > > > >>
> > > > >> 2) The collation is requested by operand with single input, such
> as
> > > > >> sort-based Aggregation.
> > > > >> - In such case, a "permutation match" is sufficient.
> > > > >> For instance, for Aggregation (b,c), input with collation (c, b)
> > > > >> could satisfy the requirement.
> > > > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c, b)
> > > > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy (c,
> > b)
> > > > >>
> > > > >> 3) The collation is requested by operand with >= 2 inputs, such as
> > > > >> sort-based MergeJoin.
> > > > >> - A permutation match is sufficient for each input
> > > > >> - MergeJoin has to do coordination, after input's trait propagates
> > > > >> upwards. In other words, ensure both inputs's permutation match
> are
> > > > >> actually same sequence. Otherwise, enforcer could be inserted upon
> > > > >> each input, and the planner generates two plans and let the cost
> > > > >> decide.
> > > > >>
> > > > >> For the first case, this is how today's RelCollation's satisfy()
> > > > >> method is implemented.
> > > > >>
> > > > >> For the second / third cases, use Haisheng's example,
> > > > >>
> > > > >> SELECT DISTINCT c, b FROM
> > > > >> ( SELECT R.c c, S.b b FROM R, S
> > > > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > >>
> > > > >> Aggregate . (c, b)
> > > > >> +--- MergeJoin . (a, b, c)
> > > > >> |--- TableScan on R
> > > > >> +--- TableScan on S
> > > > >>
> > > > >> Here is the steps that might take place in the planner:
> > > > >>
> > > > >> 1) Aggregate request permutation match collation (c, b)
> > > > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's
> > input
> > > > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's
> > LHS requirement
> > > > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins'
> > RHS requirement
> > > > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two
> > possible plans
> > > > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has
> > > > >> collation of (c, b, a)
> > > > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has
> > > > >> collation of (b, c, a)
> > > > >> 6) MJ1 and MJ2 could both satisfy permutation match request in
> step
> > > > >> 1, leading to two possible plans:
> > > > >> Agg1: with input of MJ1
> > > > >> Agg2: with input of MJ2
> > > > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2.
> > > > >>
> > > > >> I should point that the enforcer sort inserted in step 5 could
> help
> > > > >> remove redundant sort in its input, if the input's collation is
> > > > >> obtained from sort, by invoking Calcite's SortRemove Rule.
> > > > >>
> > > > >> The above only considers the column sequence. The DESC/ASC, NULL
> > > > >> FIRST/LAST will add more complexity, but we probably use similar
> > idea.
> > > > >>
> > > > >> In summary, we need :
> > > > >> 1) redefine collation trait's satisfy() policy, exact match, super
> > > > >> match, permutation match,
> > > > >> 2) different physical operator applies different trait matching
> > > > >> policy, depending on operator's # of inputs, and algorithm
> > > > >> implementation.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan <
> > h.y...@alibaba-inc.com> wrote:
> > > > >>>
> > > > >>> Hi Stamatis,
> > > > >>>
> > > > >>> Thanks for your comment. I think my example didn't make it clear.
> > > > >>>
> > > > >>> When a logical operator is created, it doesn't have any physical,
> > > > >>> propertyand it shouldn't have. When a physical operator is
> created,
> > > > >>> e.g. in Enumerable convention, it only creates an intuitive
> > traitset
> > > > >>> with it, and requests it children the corresponding ones.
> > > > >>>
> > > > >>> For operators such as Join, Aggregate, Window, which may deliver
> > > > >>> multiple different traitsets, when the parent operator is created
> > and
> > > > >>> request its traitset, it might be good to know what are the
> > poosible
> > > > >>> traitset that the child operator can deliver. e.g.
> > > > >>>
> > > > >>> SELECT DISTINCT c, b FROM
> > > > >>> ( SELECT R.c c, S.b b FROM R, S
> > > > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t;
> > > > >>>
> > > > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a).
> > > > >>> Here is the logical plan:
> > > > >>> Aggregate
> > > > >>> +--- InnerJoin
> > > > >>> |--- TableScan on R
> > > > >>> +--- TableScan on S
> > > > >>>
> > > > >>> When we create a physical merge join for the inner join, it may
> > just
> > > > >>> have collation sorted on a,b,c. Then the aggreate on top of join
> > will
> > > > >>> request another sort on c,b, thus we miss the best plan. What we
> > > > >>> can do is requesting all the order combinations, which is n!,
> like
> > > > >>> how the Values operator does. But that is too much.
> > > > >>>
> > > > >>> If we can provide an approach that can minimize the possiple
> > traitset
> > > > >>> that the child operator may deliver, we can reduce the chance of
> > missing
> > > > >>> good plans. For the above query, the Aggregate operator can
> derive
> > > > >>> possible traitsets that its child operator join can deliver, in
> > which case,
> > > > >>> the possiple traitsets of join is
> > > > >>> 1. collation on (a,b,c) based on join condition,
> > > > >>> 2. collation on (c,b,a) based on left child,
> > > > >>> 3. collation on (b,c,a) based on right child
> > > > >>> So we can request Aggregate sorted by (c,b) and Join sorted by
> > (c,b,a).
> > > > >>> The number of traiset requests and plan alternatives can be
> > reduced.
> > > > >>> The DerivedTraitSets can be used to derive the possible traitsets
> > from
> > > > >>> Join, and pass through Project, Filter etc...
> > > > >>>
> > > > >>> This is just an example of non-distributed system, for
> distributed
> > system,
> > > > >>> it can save much more by considering the possible distribution
> > delivered
> > > > >>> by child operators.
> > > > >>>
> > > > >>> One thing that concerns me is it highly relies on the traiset
> > system of the
> > > > >>> underlying physical system. Like Enumerable doesn't consider
> > distribution,
> > > > >>> because it is single-node system, but Hive/Flink are distributed
> > system.
> > > > >>> - Haisheng
> > > > >>>
> > > > >>>
> ------------------------------------------------------------------
> > > > >>> 发件人:Stamatis Zampetakis<zabe...@gmail.com>
> > > > >>> 日 期:2019年10月18日 14:53:41
> > > > >>> 收件人:<dev@calcite.apache.org>
> > > > >>> 主 题:Re: [DISCUSS] On-demand traitset request
> > > > >>>
> > > > >>> Hi Haisheng,
> > > > >>>
> > > > >>> This is an interesting topic but somehow in my mind I thought
> that
> > this
> > > > >>> mechanism is already in place.
> > > > >>>
> > > > >>> When an operator (logical or physical) is created its traitset is
> > > > >>> determined in bottom-up fashion using the create
> > > > >>> static factory method present in almost all operators. In my mind
> > this is
> > > > >>> in some sense the applicability function
> > > > >>> mentioned in [1].
> > > > >>>
> > > > >>> Now during optimization we proceed in top-down manner and we
> > request
> > > > >>> certain traitsets from the operators.
> > > > >>> If it happens and they contain already the requested traits
> > nothing needs
> > > > >>> to be done.
> > > > >>>
> > > > >>> In your example when we are about to create the sort-merge join
> we
> > can
> > > > >>> check what traitsets are present in the inputs
> > > > >>> and if possible request those. Can you elaborate a bit more why
> do
> > we need
> > > > >>> a new type of metadata?
> > > > >>>
> > > > >>> Anyway if we cannot do it at the moment it makes sense to
> complete
> > the
> > > > >>> missing bits since what you are describing
> > > > >>> was already mentioned in the original design of the Volcano
> > optimizer [1].
> > > > >>>
> > > > >>> "If a move to be pursued is the exploration of a normal query
> > processing
> > > > >>> algorithm such as merge-join, its cost is calculated by the
> > algorithm's
> > > > >>> cost function. The algorithm's applicability function determines
> > the
> > > > >>> physical properly vectors for the algorithms inputs, and their
> > costs and
> > > > >>> optimal plans are found by invoking FindBestPlan for the inputs.
> > For some
> > > > >>> binary operators, the actual physical properties of the inputs
> are
> > not as
> > > > >>> important as the consistency of physical properties among the
> > inputs. For
> > > > >>> example, for a sort-based implementation of intersection, i.e.,
> an
> > > > >>> algorithm very similar to merge-join, any sort order of the two
> > inputs will
> > > > >>> suffice as long as the two inputs are sorted in the same way.
> > Similarly,
> > > > >>> for a parallel join, any partitioning of join inputs across
> > multiple
> > > > >>> processing nodes is acceptable if both inputs are partitioned
> using
> > > > >>> Compatible partitioning rules. For these cases, the search engine
> > permits
> > > > >>> the optimizer implementor to specify a number of physical
> property
> > vectors
> > > > >>> to be tried. For example, for the intersection of two inputs R
> and
> > S with
> > > > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is
> > sorted on
> > > > >>> (B,A,C), both these sort orders can be specified by the optimizer
> > > > >>> implementor and will be optimized by the generated optimizer,
> > while other
> > > > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1]
> > > > >>>
> > > > >>> Best,
> > > > >>> Stamatis
> > > > >>>
> > > > >>> [1]
> > > > >>>
> >
> https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
> > > > >>>
> > > > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan <
> > h.y...@alibaba-inc.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>>> TL;DR
> > > > >>>> Both top-down physical TraitSet request and bottom-up TraitSet
> > > > >>>> derivation have their strongth and weakness, we propose
> > > > >>>> on-demand TraitSet request to combine the above two, to reduce
> > > > >>>> the number of plan alternatives that are genereated, especially
> > > > >>>> in distributed system.
> > > > >>>>
> > > > >>>> e.g.
> > > > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3;
> > > > >>>>
> > > > >>>> In non-distributed system, we can generate a sort merge join,
> > > > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3.
> > > > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the
> > > > >>>> chance of making use of the delivered ordering of foo. Because
> > > > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to
> > > > >>>> sort on foo anymore. There are so many choices, n!, not even
> > > > >>>> considering asc/desc and null direction. We can't request all
> > > > >>>> the possible traitsets in top-down way, and can't derive all the
> > > > >>>> possible traitsets in bottom-up way either.
> > > > >>>>
> > > > >>>> We propose on-demand traitset request by adding a new type
> > > > >>>> of metadata DerivedTraitSets into the built-in metadata system.
> > > > >>>>
> > > > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery)
> > > > >>>>
> > > > >>>> In this metadata, every operator returns several possbile
> > traitsets
> > > > >>>> that may be derived from this operator.
> > > > >>>>
> > > > >>>> Using above query as an example, the tablescan on foo should
> > > > >>>> return traiset with collation on f3, f2, f1.
> > > > >>>>
> > > > >>>> In physical implementation rules, e.g. the SortMergeJoinRule,
> > > > >>>> it gets possible traitsets from both child operators, uses the
> > join
> > > > >>>> keys to eliminate useless traitsets, leaves out usefull
> traitsets,
> > > > >>>> and requests corresponding traitset on the other child.
> > > > >>>>
> > > > >>>> This relies on the feature of AbstractConverter, which is turned
> > > > >>>> off by default, due to performance issue [1].
> > > > >>>>
> > > > >>>> Thoughts?
> > > > >>>>
> > > > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > > > >>>>
> > > > >>>> Haisheng
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Reply via email to