Hi Haisheng, Thanks a lot for sharing this great proposal ~ For short I understand your idea as below: 1. Derive the distributions/collations that children COULD/MIGHT offer 2. Decide the best distributions/collations by first point and computing logic of operator, say gropuings in Aggregate;
It comes to me that another important part is that children operator should also provide the corresponding COST for the POSSIBLE distribution/collation. The COST is not for the final plan, but a hypothesis. Take below example SELECT DISTINCT c, b FROM ( SELECT R.c c, S.b b FROM R, S WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a). Aggregate +--- InnerJoin |--- TableScan on R +--- TableScan on S InnerJoin should deliver that its possible collations and corresponding costs at the same time. - If ordered by (c, b, a) my cost is ... - If ordered by (b, c, a) my cost is ... - If ordered by (a, b, c) my cost is ... By which Aggregate decide the 'best' required collation. By this way we can better limit the searching space and also target the relatively optimized (if not best) plan. Also when you say "I didn't say adding to RelNode, but a new API/interface for physical operator only.", I'm not so clear; Currently the physical operators in Calcite like EnumerableHashJoin, EnumerableMergeJoin, when created, their physical behavior(like real collations) are determined. So I belive you intend to add new API at upper layer, but there's no physical optimizing phase in Calcite at this moment. Where do you want to add the new API, can you specify ? Thanks, Jin Jinfeng Ni <j...@apache.org> 于2019年11月6日周三 上午1:56写道: > @Haisheng, @Xiening, > > Thanks for pointing that previous email out. Overall, I agree that > the physical trait enforcement should be done in the engine, not in > the rule. For the rule, it should only specify the request, and the > corresponding transformation, and let the engine to explore the search > space. It will be great if we can revamp the Volcano optimizer > framework, to do that way. > > In terms of search space, it's always a tradeoff between the space > searched and the optimality of the plan found. I think it's fine for > the engine to explore a potential big search space, as long as it has > effective "bound-and-prune" strategy. In the original Volcano paper, > there is a way to prune the search space based on the best plan found > so far, using the parameter "limit". When an implementable plan is > found, a "real" cost is obtained, which could be used to prune > un-necessary search space. That's actually the advantage of Volcano's > "top-down" approach. However, seems to me that Calcite's Volcano did > not apply that approach effectively, because of the existence of > AbstractConverter. > > > On Sun, Nov 3, 2019 at 10:12 PM Haisheng Yuan <h.y...@alibaba-inc.com> > wrote: > > > > Hi Jinfeng, > > > > I think you might have missed the email about proposed API for physical > operators I sent out previously in [1]. > > > > We don't need request all the permutation, which is also impossible in > practice, the search space is going to explode. > > > > In the example in email [1], I already talked about your concen on > passing down parent request into children may lead to less optimal plan. > Besically join operator can send 2 collation optimization requests, one is > to pass request down, the other one is ignore the parent's request. > > > > Using AbstractConverter to enforce properties is inapporpriate, which > handles all the optimization work to physical operator providers, meaning > there is almost no physical level optimization mechanism in Calcite. SQL > Server and Greenplum's optimizer, which are Cascades framework based, > implemented the property enforcement in the core optimizer engine, not > through AbstractConverter and rules, physical operators just need to > implement those methods (or similar) I mentioned in email [1]. My goal is > completely abolishing AbstractConverter. > > > > [1] > http://mail-archives.apache.org/mod_mbox/calcite-dev/201910.mbox/%3cd75b20f4-542a-4a73-897e-66ab426494c1.h.y...@alibaba-inc.com%3e > > > > - Haisheng > > > > ------------------------------------------------------------------ > > 发件人:Jinfeng Ni<j...@apache.org> > > 日 期:2019年11月01日 14:10:30 > > 收件人:<dev@calcite.apache.org> > > 主 题:Re: [DISCUSS] On-demand traitset request > > > > Hi Xiening, > > > > "Let say if R and S doesn’t have sorting properties at all. In your > > case, we would end up adding enforcers for LHS and RHS to get > > collation (a, b, c). Then we would need another enforcer to get > > collation (b, c). This is a sub optimal plan as we could have use (b, > > c, a) for join." > > > > In such case, for step 2 when MergeJoin request a permutation match of > > (a, b,c) on both it's input, it is not necessary to end up with > > collation (a, b, c) only. Since it request "permutation", MJ could ask > > all possible satisfying collations, which include (b, c, a). In other > > words, the steps I described did not exclude such plan. > > > > You may argue it would increase the search space. However, by > > limiting the search space, without explore all possible choice, we may > > lose the chance getting 'optimal' plan we want. For instance, in the > > above example, the idea of passing "on demand" trait request (b,c) > > from Agg to MJ is to avoid unnecessary sort (b,c). In cases where the > > join condition has good filtering, and such sort of join output could > > be quite cheap. Yet in the plan enumeration, since we use "on demand" > > trait request from parent to guide the actions of MJ, I'm not sure if > > we may restrict the choices we consider in the legs of join, whose > > cardinality could be larger and play a bigger role in the overall > > cost. > > > > In other words, by using "on demand" trait request, we may restrict > > the choices of plan, possibly in the some operators with larger data > > size. > > > > In the current implementation of VolcanoPlanner, I feel the root issue > > of long planning time is not to explore all possible satisfying trait. > > It is actually the unnecessary of AbstractConverter, added to the > > equivalence class. > > > > > > On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai <xndai....@gmail.com> > wrote: > > > > > > Thanks for the sharing. I like the way you model this problem, Jinfeng. > > > > > > There’s one minor issue with your example. Let say if R and S doesn’t > have sorting properties at all. In your case, we would end up adding > enforcers for LHS and RHS to get collation (a, b, c). Then we would need > another enforcer to get collation (b, c). This is a sub optimal plan as we > could have use (b, c, a) for join. > > > > > > I think in step #2, the join operator would need to take the agg trait > requirement into account. Then it would have two options - > > > > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to > guarantee the join output would deliver the collation agg needs. > > > 2) require permutation match of (a, b, c); in such case, an enforcer > might be needed for aggregation. > > > > > > Eventually the cost model decides who is the winner. > > > > > > There’s a fundamental difference between your model and Haisheng’s > proposal. In Haisheng’s case, a rel node not only looks at its parent’s > requirement, but also tries to get the potential traits its input could > deliver. It would try to align them to eliminate unnecessary alternatives. > > > > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to > implement option 1), we would generate two alternatives - > > > > > > MergeJoin (b, c, a) > > > TableScan R > > > Sort(b, c, a) > > > TableScan S > > > > > > MergeJoin(c, b, a) > > > Sort(c, b, a) > > > TableScan R > > > Sort(c, b, a) > > > TableScan S > > > > > > But if we look at the input traits and has the insight that R already > delivers (b, c, a), we could decide to require (b, c, a) only and avoid > generating the 2nd plan, which is definitely worse, and reduce the search > space. > > > > > > > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <j...@apache.org> wrote: > > > > > > > > A little bit of history. In Drill, when we first implemented > > > > Distribution trait's definition, we allows both exact match and > > > > partial match in satisfy() method. This works fine for single-input > > > > operator such aggregation, however it leads to incorrect plan for > join > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that > > > > time, we removed partial match, and use exact match only. Yet this > > > > changes leads to unnecessary additional exchange. To mitigate this > > > > problem, in join physical operator, for a join key (a, b, c), we > > > > enumerate different distribution requests, yet this lead to more > space > > > > to explore and significantly increase planning time (which is > probably > > > > what Haisheng also experienced). When I look back, I feel probably > > > > what we miss is the "coordination" step in the join operator, because > > > > if we relax the requirement of satisfy(), for multi-input operators, > > > > we have to enforce some "coordination", to make sure multiple input's > > > > trait could work together properly. > > > > > > > > > > > > > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <j...@apache.org> wrote: > > > >> > > > >> This is an interesting topic. Thanks for bringing up this issue. > > > >> > > > >> My understanding of Volcano planner is it works in a top-down search > > > >> mode (the parent asks for certain trait of its child), while the > trait > > > >> propagates in a bottom-up way, as Stamatis explained. > > > >> > > > >> IMHO, the issue comes down to the definition of RelTrait, how to > > > >> determine if a trait A could satisfy a request asking for trait B, > > > >> that is, how RelTrait.satisfies() method is implemented. > > > >> > > > >> Let's first clarify different situations, using collation as > example. > > > >> 1) The collation is requested by query's outmost ORDER BY clause. > > > >> - The generated plan has to have "exact match", i.e same collation > > > >> (same column sequence), or "super match" . > > > >> exact match: (a, b) satisfy (a, b) > > > >> super match: (a, b, c) satisfy (a, b) > > > >> > > > >> 2) The collation is requested by operand with single input, such as > > > >> sort-based Aggregation. > > > >> - In such case, a "permutation match" is sufficient. > > > >> For instance, for Aggregation (b,c), input with collation (c, b) > > > >> could satisfy the requirement. > > > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c, b) > > > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy (c, > b) > > > >> > > > >> 3) The collation is requested by operand with >= 2 inputs, such as > > > >> sort-based MergeJoin. > > > >> - A permutation match is sufficient for each input > > > >> - MergeJoin has to do coordination, after input's trait propagates > > > >> upwards. In other words, ensure both inputs's permutation match are > > > >> actually same sequence. Otherwise, enforcer could be inserted upon > > > >> each input, and the planner generates two plans and let the cost > > > >> decide. > > > >> > > > >> For the first case, this is how today's RelCollation's satisfy() > > > >> method is implemented. > > > >> > > > >> For the second / third cases, use Haisheng's example, > > > >> > > > >> SELECT DISTINCT c, b FROM > > > >> ( SELECT R.c c, S.b b FROM R, S > > > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > >> > > > >> Aggregate . (c, b) > > > >> +--- MergeJoin . (a, b, c) > > > >> |--- TableScan on R > > > >> +--- TableScan on S > > > >> > > > >> Here is the steps that might take place in the planner: > > > >> > > > >> 1) Aggregate request permutation match collation (c, b) > > > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's > input > > > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's > LHS requirement > > > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' > RHS requirement > > > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two > possible plans > > > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has > > > >> collation of (c, b, a) > > > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has > > > >> collation of (b, c, a) > > > >> 6) MJ1 and MJ2 could both satisfy permutation match request in step > > > >> 1, leading to two possible plans: > > > >> Agg1: with input of MJ1 > > > >> Agg2: with input of MJ2 > > > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2. > > > >> > > > >> I should point that the enforcer sort inserted in step 5 could help > > > >> remove redundant sort in its input, if the input's collation is > > > >> obtained from sort, by invoking Calcite's SortRemove Rule. > > > >> > > > >> The above only considers the column sequence. The DESC/ASC, NULL > > > >> FIRST/LAST will add more complexity, but we probably use similar > idea. > > > >> > > > >> In summary, we need : > > > >> 1) redefine collation trait's satisfy() policy, exact match, super > > > >> match, permutation match, > > > >> 2) different physical operator applies different trait matching > > > >> policy, depending on operator's # of inputs, and algorithm > > > >> implementation. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan < > h.y...@alibaba-inc.com> wrote: > > > >>> > > > >>> Hi Stamatis, > > > >>> > > > >>> Thanks for your comment. I think my example didn't make it clear. > > > >>> > > > >>> When a logical operator is created, it doesn't have any physical, > > > >>> propertyand it shouldn't have. When a physical operator is created, > > > >>> e.g. in Enumerable convention, it only creates an intuitive > traitset > > > >>> with it, and requests it children the corresponding ones. > > > >>> > > > >>> For operators such as Join, Aggregate, Window, which may deliver > > > >>> multiple different traitsets, when the parent operator is created > and > > > >>> request its traitset, it might be good to know what are the > poosible > > > >>> traitset that the child operator can deliver. e.g. > > > >>> > > > >>> SELECT DISTINCT c, b FROM > > > >>> ( SELECT R.c c, S.b b FROM R, S > > > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > >>> > > > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a). > > > >>> Here is the logical plan: > > > >>> Aggregate > > > >>> +--- InnerJoin > > > >>> |--- TableScan on R > > > >>> +--- TableScan on S > > > >>> > > > >>> When we create a physical merge join for the inner join, it may > just > > > >>> have collation sorted on a,b,c. Then the aggreate on top of join > will > > > >>> request another sort on c,b, thus we miss the best plan. What we > > > >>> can do is requesting all the order combinations, which is n!, like > > > >>> how the Values operator does. But that is too much. > > > >>> > > > >>> If we can provide an approach that can minimize the possiple > traitset > > > >>> that the child operator may deliver, we can reduce the chance of > missing > > > >>> good plans. For the above query, the Aggregate operator can derive > > > >>> possible traitsets that its child operator join can deliver, in > which case, > > > >>> the possiple traitsets of join is > > > >>> 1. collation on (a,b,c) based on join condition, > > > >>> 2. collation on (c,b,a) based on left child, > > > >>> 3. collation on (b,c,a) based on right child > > > >>> So we can request Aggregate sorted by (c,b) and Join sorted by > (c,b,a). > > > >>> The number of traiset requests and plan alternatives can be > reduced. > > > >>> The DerivedTraitSets can be used to derive the possible traitsets > from > > > >>> Join, and pass through Project, Filter etc... > > > >>> > > > >>> This is just an example of non-distributed system, for distributed > system, > > > >>> it can save much more by considering the possible distribution > delivered > > > >>> by child operators. > > > >>> > > > >>> One thing that concerns me is it highly relies on the traiset > system of the > > > >>> underlying physical system. Like Enumerable doesn't consider > distribution, > > > >>> because it is single-node system, but Hive/Flink are distributed > system. > > > >>> - Haisheng > > > >>> > > > >>> ------------------------------------------------------------------ > > > >>> 发件人:Stamatis Zampetakis<zabe...@gmail.com> > > > >>> 日 期:2019年10月18日 14:53:41 > > > >>> 收件人:<dev@calcite.apache.org> > > > >>> 主 题:Re: [DISCUSS] On-demand traitset request > > > >>> > > > >>> Hi Haisheng, > > > >>> > > > >>> This is an interesting topic but somehow in my mind I thought that > this > > > >>> mechanism is already in place. > > > >>> > > > >>> When an operator (logical or physical) is created its traitset is > > > >>> determined in bottom-up fashion using the create > > > >>> static factory method present in almost all operators. In my mind > this is > > > >>> in some sense the applicability function > > > >>> mentioned in [1]. > > > >>> > > > >>> Now during optimization we proceed in top-down manner and we > request > > > >>> certain traitsets from the operators. > > > >>> If it happens and they contain already the requested traits > nothing needs > > > >>> to be done. > > > >>> > > > >>> In your example when we are about to create the sort-merge join we > can > > > >>> check what traitsets are present in the inputs > > > >>> and if possible request those. Can you elaborate a bit more why do > we need > > > >>> a new type of metadata? > > > >>> > > > >>> Anyway if we cannot do it at the moment it makes sense to complete > the > > > >>> missing bits since what you are describing > > > >>> was already mentioned in the original design of the Volcano > optimizer [1]. > > > >>> > > > >>> "If a move to be pursued is the exploration of a normal query > processing > > > >>> algorithm such as merge-join, its cost is calculated by the > algorithm's > > > >>> cost function. The algorithm's applicability function determines > the > > > >>> physical properly vectors for the algorithms inputs, and their > costs and > > > >>> optimal plans are found by invoking FindBestPlan for the inputs. > For some > > > >>> binary operators, the actual physical properties of the inputs are > not as > > > >>> important as the consistency of physical properties among the > inputs. For > > > >>> example, for a sort-based implementation of intersection, i.e., an > > > >>> algorithm very similar to merge-join, any sort order of the two > inputs will > > > >>> suffice as long as the two inputs are sorted in the same way. > Similarly, > > > >>> for a parallel join, any partitioning of join inputs across > multiple > > > >>> processing nodes is acceptable if both inputs are partitioned using > > > >>> Compatible partitioning rules. For these cases, the search engine > permits > > > >>> the optimizer implementor to specify a number of physical property > vectors > > > >>> to be tried. For example, for the intersection of two inputs R and > S with > > > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is > sorted on > > > >>> (B,A,C), both these sort orders can be specified by the optimizer > > > >>> implementor and will be optimized by the generated optimizer, > while other > > > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1] > > > >>> > > > >>> Best, > > > >>> Stamatis > > > >>> > > > >>> [1] > > > >>> > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf > > > >>> > > > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan < > h.y...@alibaba-inc.com> > > > >>> wrote: > > > >>> > > > >>>> TL;DR > > > >>>> Both top-down physical TraitSet request and bottom-up TraitSet > > > >>>> derivation have their strongth and weakness, we propose > > > >>>> on-demand TraitSet request to combine the above two, to reduce > > > >>>> the number of plan alternatives that are genereated, especially > > > >>>> in distributed system. > > > >>>> > > > >>>> e.g. > > > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3; > > > >>>> > > > >>>> In non-distributed system, we can generate a sort merge join, > > > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3. > > > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the > > > >>>> chance of making use of the delivered ordering of foo. Because > > > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to > > > >>>> sort on foo anymore. There are so many choices, n!, not even > > > >>>> considering asc/desc and null direction. We can't request all > > > >>>> the possible traitsets in top-down way, and can't derive all the > > > >>>> possible traitsets in bottom-up way either. > > > >>>> > > > >>>> We propose on-demand traitset request by adding a new type > > > >>>> of metadata DerivedTraitSets into the built-in metadata system. > > > >>>> > > > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery) > > > >>>> > > > >>>> In this metadata, every operator returns several possbile > traitsets > > > >>>> that may be derived from this operator. > > > >>>> > > > >>>> Using above query as an example, the tablescan on foo should > > > >>>> return traiset with collation on f3, f2, f1. > > > >>>> > > > >>>> In physical implementation rules, e.g. the SortMergeJoinRule, > > > >>>> it gets possible traitsets from both child operators, uses the > join > > > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets, > > > >>>> and requests corresponding traitset on the other child. > > > >>>> > > > >>>> This relies on the feature of AbstractConverter, which is turned > > > >>>> off by default, due to performance issue [1]. > > > >>>> > > > >>>> Thoughts? > > > >>>> > > > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970 > > > >>>> > > > >>>> Haisheng > > > >>>> > > > >>>> > > > >>> > > > > > >