It can be extended to other traits easily. The APIs for distribution and collation are for convenience, as all the databases have these traits, for single-node database, distribution can just be ANY. public <T extends RelTrait> T requiredTrait(RelTraitDef<T> traitDef, RelTrait required, int child, int optReqId) public <T extends RelTrait> T derivedTrait(RelTraitDef<T> traitDef) - Haisheng
------------------------------------------------------------------ 发件人:Stamatis Zampetakis<zabe...@gmail.com> 日 期:2019年10月23日 14:53:38 收件人:<dev@calcite.apache.org> 主 题:Re: Re: [DISCUSS] On-demand traitset request Overall, I agree that better encapsulation of propagation and derivation of traits would be beneficial for our system. Regarding the API proposed by Haisheng, I have to think a bit more on it. At first glance, adding such methods directly in the RelNode API does not appear an ideal solution since I don't see how easily it can be extended to support other kinds of traits. Best, Stamatis On Mon, Oct 21, 2019 at 7:31 AM Haisheng Yuan <h.y...@alibaba-inc.com> wrote: > To Stamatis, > Not exactly. My initial thought was giving the physical operator the > abiity to customize and fully control physical property derivation > strategy, thus can further help the purpose driven trait request. But since > we agree to think more high-level API to support on-demand traitset > request, I will illustrate what API is expected from implentator's > perspective. > > Jingfeng gave us basic steps on how the plan might be generated using > top-down purpose driven only manner, I think differently with the first > several steps. > > SELECT DISTINCT c, b FROM > ( SELECT R.c c, S.b b FROM R, S > WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > Aggregate . (c, b) > +--- MergeJoin . (a, b, c) > |--- TableScan on R > +-- TableScan on S > > 1. Aggreate require collation (c,b) from its child, not permutation. > 2. MergeJoin's parent require (c,b), it has 2 options. Pass it down, or > ignore it. > a) Pass down. it has join condition on (a,b,c), the required columns > can be coverd by join condition columns, so MergeJoin will try to deliver > (c,b,a), and both children must exact match. Then we will have sort on both > children of MergeJoin. > b) Ignore it. Require its first child collation on (a,b,c), but > matching type is subset. R delivers (c,b,a). Then using the first child's > derived collation trait to require its second child to exact match. Thus we > have a sort on S, and a sort on top of MergeJoin. > > Both plan might be good or bad. If R, S are large, but the join result is > small, plan b) might be better, otherwise plan a) might be better. > > Anyway, I hope the physical operators can have full control the physical > properties requests and derivation, in physical operator class itself, not > rules, not other places. > > Per our experience, we have spent too much time on writing code for > dealing with all kinds of property requirement and derivation. But in fact, > life should be easier. I would like to the physical operator provides the > following API, and the 3rd party implementator just need to > override/implement them, no more need to be taken care. > > 1. void setDistributionRequests(int numReq) > Each operator can specify how many optimzation requests on some trait it > want to do. e.g. HashJoin may request the following distribution on both > children: > - (hash distribution on key1, hash distribution on key1) > - (hash distribution on key2, hash distribution on key2) > - (hash distribution on all keys, hash distribution on all keys) > - (Any, Broadcast) > - (Gather, Gather) > > 2. RelDistribution requiredDistribution(RelDistribution required, int > child) //same for collation > Given the required distribution from parent operator, returns the required > distribution for its nth child. > > 3. RelDistribution derivedDistribution() //same for collation > Derive the distribution of the operator itelf from child operators. > > 4. MatchType distributionMatchType(int child) //same for collation > Returns the distribution match type for its nth child, how does it match > the other children. > Similar with Jinfeng's point, I think there should be 3 types of matching: > exact, satisfy, subset. > e.g. > R is distributed by (a), S is distributed by (a,b) > select * from R join S using a,b,c > If we have plan > HashJoin > |-- TableScan on R > +-- TableScan on S > We may require the match type on S to be satisfy. (a,b) satisfies required > distribution (a,b,c). > Fot the outer child R, we require it to be exact match with inner. > > 5. ExecOrder getExecOrder() > Returns how the operator's children is executed, left to right, or right > to left. Typically, hash join is right to left. We might use this as the > optimization order. To make sure we have correct plans, we have to optimize > child and enforce properties in the order that is specific to the physical > operator. > All the other dirty work should be done by the optimization engine, but > not through rules, I believe. However, I havn't got any clear plan on how > to achieve it inside the engine. > > Haisheng > > ------------------------------------------------------------------ > 发件人:Jacques Nadeau<jacq...@apache.org> > 日 期:2019年10月21日 11:04:19 > 收件人:<dev@calcite.apache.org> > 主 题:Re: [DISCUSS] On-demand traitset request > > Definitely agree that this has been a long time missing. I've been > challenged by this absence since before Calcite was Calcite. I also > remember the trials and tribulations around this that Jinfeng references > above. > > In general, I think the first thing one might want to before actually doing > this is to make trait derivation internally defined based on the impact > that a rel node has on traits. I've always found the externally provided > rel traits to be problematic and a potential place for hidden bugs (row > type has the same problem) . It means that trait derivation of a relnode is > based on the rules that do transformation as opposed to the "physical" > impact of the relnode. (It also leads to derivation behavior for a relnode > being scattered in many different rules.) If moved to the rel node, it also > provides a second benefit, once you encapsulate this propagation logic, you > could also expose this as a trait derivation function that the planner > could use to seek out derivation paths. > > At Dremio we toyed last year with the idea of adding a heuristic cycle on > top of the existing volano planner and relset state. In this model a > RelNode would have two additional methods: it would expose a trait > propagation function (as described above) and optionally expose one or more > specific traits this node desired. When the planner arrived at a > conclusion, you'd run the heuristic cycle to further propagate desired > traits (if possible) and then restart the planning cycle based on any new > transformations done during the heuristic stage. You'd then repeat this > volcano/trait prop cycle until you arrive at a "completed" state. > > We never actually got to implementation but I'm super supportive of someone > picking this up. > > > > On Sat, Oct 19, 2019 at 12:25 AM Stamatis Zampetakis <zabe...@gmail.com> > wrote: > > > Thanks all for the very interesting usecases and helpful examples. > > > > I would like to stay a bit on the fact that logical operators do not have > > physical traits. Calcite's logical operators do have at least one > physical > > trait which is Convention.NONE. Other logical operators such as: > > > > LogicalTableScan [1] > > LogicalFilter [2] > > LogicalProject [3] > > LogicalWindow [4] > > > > have additional traits regarding collation and distribution. There is > > already some sort of trait derivation so to some extend it is possible to > > check the traitset of the child (logical) operator before requesting some > > other traitset when creating the parent (physical). > > > > I see that this mechanism of adding explicitly traits to logical > operators > > may be confusing and may also lead to planning problems. Replacing it by > > metadata might be a good idea and it is closer to the idea of > > "applicability function" mentioned in the Volcano paper. Assuming that we > > follow this approach I would assume that the traitset of logical > operators > > from now on should be always empty. > > > > Is this what you have in mind Haisheng? > > > > Best, > > Stamatis > > > > [1] > > > > > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalTableScan.java#L95 > > [2] > > > > > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalFilter.java#L105 > > [3] > > > > > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalProject.java#L104 > > [4] > > > > > https://github.com/apache/calcite/blob/cd24cae77072e56e4333d10114bf380be79709f1/core/src/main/java/org/apache/calcite/rel/logical/LogicalWindow.java#L95 > > > > On Sat, Oct 19, 2019 at 7:39 AM Xiening Dai <xndai....@gmail.com> wrote: > > > > > Thanks for the sharing. I like the way you model this problem, Jinfeng. > > > > > > There’s one minor issue with your example. Let say if R and S doesn’t > > have > > > sorting properties at all. In your case, we would end up adding > enforcers > > > for LHS and RHS to get collation (a, b, c). Then we would need another > > > enforcer to get collation (b, c). This is a sub optimal plan as we > could > > > have use (b, c, a) for join. > > > > > > I think in step #2, the join operator would need to take the agg trait > > > requirement into account. Then it would have two options - > > > > > > 1) require *exact/super* match of (b, c, a) or (c, b, a); this is to > > > guarantee the join output would deliver the collation agg needs. > > > 2) require permutation match of (a, b, c); in such case, an enforcer > > might > > > be needed for aggregation. > > > > > > Eventually the cost model decides who is the winner. > > > > > > There’s a fundamental difference between your model and Haisheng’s > > > proposal. In Haisheng’s case, a rel node not only looks at its parent’s > > > requirement, but also tries to get the potential traits its input could > > > deliver. It would try to align them to eliminate unnecessary > > alternatives. > > > > > > In above example, assuming R is (b, c, a) and S is (a, b, c), to > > implement > > > option 1), we would generate two alternatives - > > > > > > MergeJoin (b, c, a) > > > TableScan R > > > Sort(b, c, a) > > > TableScan S > > > > > > MergeJoin(c, b, a) > > > Sort(c, b, a) > > > TableScan R > > > Sort(c, b, a) > > > TableScan S > > > > > > But if we look at the input traits and has the insight that R already > > > delivers (b, c, a), we could decide to require (b, c, a) only and avoid > > > generating the 2nd plan, which is definitely worse, and reduce the > search > > > space. > > > > > > > > > > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni <j...@apache.org> wrote: > > > > > > > > A little bit of history. In Drill, when we first implemented > > > > Distribution trait's definition, we allows both exact match and > > > > partial match in satisfy() method. This works fine for single-input > > > > operator such aggregation, however it leads to incorrect plan for > join > > > > query, i.e LHS shuffle with (a, b), RHS shuffle with (a) . At that > > > > time, we removed partial match, and use exact match only. Yet this > > > > changes leads to unnecessary additional exchange. To mitigate this > > > > problem, in join physical operator, for a join key (a, b, c), we > > > > enumerate different distribution requests, yet this lead to more > space > > > > to explore and significantly increase planning time (which is > probably > > > > what Haisheng also experienced). When I look back, I feel probably > > > > what we miss is the "coordination" step in the join operator, because > > > > if we relax the requirement of satisfy(), for multi-input operators, > > > > we have to enforce some "coordination", to make sure multiple input's > > > > trait could work together properly. > > > > > > > > > > > > > > > > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni <j...@apache.org> wrote: > > > >> > > > >> This is an interesting topic. Thanks for bringing up this issue. > > > >> > > > >> My understanding of Volcano planner is it works in a top-down search > > > >> mode (the parent asks for certain trait of its child), while the > trait > > > >> propagates in a bottom-up way, as Stamatis explained. > > > >> > > > >> IMHO, the issue comes down to the definition of RelTrait, how to > > > >> determine if a trait A could satisfy a request asking for trait B, > > > >> that is, how RelTrait.satisfies() method is implemented. > > > >> > > > >> Let's first clarify different situations, using collation as > example. > > > >> 1) The collation is requested by query's outmost ORDER BY clause. > > > >> - The generated plan has to have "exact match", i.e same collation > > > >> (same column sequence), or "super match" . > > > >> exact match: (a, b) satisfy (a, b) > > > >> super match: (a, b, c) satisfy (a, b) > > > >> > > > >> 2) The collation is requested by operand with single input, such as > > > >> sort-based Aggregation. > > > >> - In such case, a "permutation match" is sufficient. > > > >> For instance, for Aggregation (b,c), input with collation (c, b) > > > >> could satisfy the requirement. > > > >> permutation match: (b, c) satisfy (c, b). (c, b) satisfy (c, > > b) > > > >> permutation match: (b, c, a) satisfy (c, b). (c, b, a) satisfy > > (c, > > > b) > > > >> > > > >> 3) The collation is requested by operand with >= 2 inputs, such as > > > >> sort-based MergeJoin. > > > >> - A permutation match is sufficient for each input > > > >> - MergeJoin has to do coordination, after input's trait propagates > > > >> upwards. In other words, ensure both inputs's permutation match are > > > >> actually same sequence. Otherwise, enforcer could be inserted upon > > > >> each input, and the planner generates two plans and let the cost > > > >> decide. > > > >> > > > >> For the first case, this is how today's RelCollation's satisfy() > > > >> method is implemented. > > > >> > > > >> For the second / third cases, use Haisheng's example, > > > >> > > > >> SELECT DISTINCT c, b FROM > > > >> ( SELECT R.c c, S.b b FROM R, S > > > >> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > >> > > > >> Aggregate . (c, b) > > > >> +--- MergeJoin . (a, b, c) > > > >> |--- TableScan on R > > > >> +--- TableScan on S > > > >> > > > >> Here is the steps that might take place in the planner: > > > >> > > > >> 1) Aggregate request permutation match collation (c, b) > > > >> 2) MergeJoin request a permutation match of (a, b,c) on both it's > > input > > > >> 3) R respond with collation (c, b, a), which satisfy MergeJoin's LHS > > > requirement > > > >> 4) S respond with collation (b, c, a), which satisfy MergeJoins' RHS > > > requirement > > > >> 5) MergeJoin do a coordination o LHS, RHS, and generate two possible > > > plans > > > >> MJ1: Insert a sort of (c, b, a) on RHS. This MJ operator now has > > > >> collation of (c, b, a) > > > >> MJ2: Insert a sort of (b, c, a) on LHS. This MJ operator now has > > > >> collation of (b, c, a) > > > >> 6) MJ1 and MJ2 could both satisfy permutation match request in step > > > >> 1, leading to two possible plans: > > > >> Agg1: with input of MJ1 > > > >> Agg2: with input of MJ2 > > > >> 7) planner chooses a best plan based on cost of Agg1 and Agg2. > > > >> > > > >> I should point that the enforcer sort inserted in step 5 could help > > > >> remove redundant sort in its input, if the input's collation is > > > >> obtained from sort, by invoking Calcite's SortRemove Rule. > > > >> > > > >> The above only considers the column sequence. The DESC/ASC, NULL > > > >> FIRST/LAST will add more complexity, but we probably use similar > idea. > > > >> > > > >> In summary, we need : > > > >> 1) redefine collation trait's satisfy() policy, exact match, super > > > >> match, permutation match, > > > >> 2) different physical operator applies different trait matching > > > >> policy, depending on operator's # of inputs, and algorithm > > > >> implementation. > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Fri, Oct 18, 2019 at 2:51 PM Haisheng Yuan < > h.y...@alibaba-inc.com > > > > > > wrote: > > > >>> > > > >>> Hi Stamatis, > > > >>> > > > >>> Thanks for your comment. I think my example didn't make it clear. > > > >>> > > > >>> When a logical operator is created, it doesn't have any physical, > > > >>> propertyand it shouldn't have. When a physical operator is created, > > > >>> e.g. in Enumerable convention, it only creates an intuitive > traitset > > > >>> with it, and requests it children the corresponding ones. > > > >>> > > > >>> For operators such as Join, Aggregate, Window, which may deliver > > > >>> multiple different traitsets, when the parent operator is created > and > > > >>> request its traitset, it might be good to know what are the > poosible > > > >>> traitset that the child operator can deliver. e.g. > > > >>> > > > >>> SELECT DISTINCT c, b FROM > > > >>> ( SELECT R.c c, S.b b FROM R, S > > > >>> WHERE R.a=S.a and R.b=S.b and R.c=S.c) t; > > > >>> > > > >>> Suppose R is ordered by (c, b, a), and S is ordered by (b, c, a). > > > >>> Here is the logical plan: > > > >>> Aggregate > > > >>> +--- InnerJoin > > > >>> |--- TableScan on R > > > >>> +--- TableScan on S > > > >>> > > > >>> When we create a physical merge join for the inner join, it may > just > > > >>> have collation sorted on a,b,c. Then the aggreate on top of join > will > > > >>> request another sort on c,b, thus we miss the best plan. What we > > > >>> can do is requesting all the order combinations, which is n!, like > > > >>> how the Values operator does. But that is too much. > > > >>> > > > >>> If we can provide an approach that can minimize the possiple > traitset > > > >>> that the child operator may deliver, we can reduce the chance of > > > missing > > > >>> good plans. For the above query, the Aggregate operator can derive > > > >>> possible traitsets that its child operator join can deliver, in > which > > > case, > > > >>> the possiple traitsets of join is > > > >>> 1. collation on (a,b,c) based on join condition, > > > >>> 2. collation on (c,b,a) based on left child, > > > >>> 3. collation on (b,c,a) based on right child > > > >>> So we can request Aggregate sorted by (c,b) and Join sorted by > > (c,b,a). > > > >>> The number of traiset requests and plan alternatives can be > reduced. > > > >>> The DerivedTraitSets can be used to derive the possible traitsets > > from > > > >>> Join, and pass through Project, Filter etc... > > > >>> > > > >>> This is just an example of non-distributed system, for distributed > > > system, > > > >>> it can save much more by considering the possible distribution > > > delivered > > > >>> by child operators. > > > >>> > > > >>> One thing that concerns me is it highly relies on the traiset > system > > > of the > > > >>> underlying physical system. Like Enumerable doesn't consider > > > distribution, > > > >>> because it is single-node system, but Hive/Flink are distributed > > > system. > > > >>> - Haisheng > > > >>> > > > >>> ------------------------------------------------------------------ > > > >>> 发件人:Stamatis Zampetakis<zabe...@gmail.com> > > > >>> 日 期:2019年10月18日 14:53:41 > > > >>> 收件人:<dev@calcite.apache.org> > > > >>> 主 题:Re: [DISCUSS] On-demand traitset request > > > >>> > > > >>> Hi Haisheng, > > > >>> > > > >>> This is an interesting topic but somehow in my mind I thought that > > this > > > >>> mechanism is already in place. > > > >>> > > > >>> When an operator (logical or physical) is created its traitset is > > > >>> determined in bottom-up fashion using the create > > > >>> static factory method present in almost all operators. In my mind > > this > > > is > > > >>> in some sense the applicability function > > > >>> mentioned in [1]. > > > >>> > > > >>> Now during optimization we proceed in top-down manner and we > request > > > >>> certain traitsets from the operators. > > > >>> If it happens and they contain already the requested traits nothing > > > needs > > > >>> to be done. > > > >>> > > > >>> In your example when we are about to create the sort-merge join we > > can > > > >>> check what traitsets are present in the inputs > > > >>> and if possible request those. Can you elaborate a bit more why do > we > > > need > > > >>> a new type of metadata? > > > >>> > > > >>> Anyway if we cannot do it at the moment it makes sense to complete > > the > > > >>> missing bits since what you are describing > > > >>> was already mentioned in the original design of the Volcano > optimizer > > > [1]. > > > >>> > > > >>> "If a move to be pursued is the exploration of a normal query > > > processing > > > >>> algorithm such as merge-join, its cost is calculated by the > > algorithm's > > > >>> cost function. The algorithm's applicability function determines > the > > > >>> physical properly vectors for the algorithms inputs, and their > costs > > > and > > > >>> optimal plans are found by invoking FindBestPlan for the inputs. > For > > > some > > > >>> binary operators, the actual physical properties of the inputs are > > not > > > as > > > >>> important as the consistency of physical properties among the > inputs. > > > For > > > >>> example, for a sort-based implementation of intersection, i.e., an > > > >>> algorithm very similar to merge-join, any sort order of the two > > inputs > > > will > > > >>> suffice as long as the two inputs are sorted in the same way. > > > Similarly, > > > >>> for a parallel join, any partitioning of join inputs across > multiple > > > >>> processing nodes is acceptable if both inputs are partitioned using > > > >>> Compatible partitioning rules. For these cases, the search engine > > > permits > > > >>> the optimizer implementor to specify a number of physical property > > > vectors > > > >>> to be tried. For example, for the intersection of two inputs R and > S > > > with > > > >>> attributes A, B, and C where R is sorted on (A,B,C) and S is sorted > > on > > > >>> (B,A,C), both these sort orders can be specified by the optimizer > > > >>> implementor and will be optimized by the generated optimizer, while > > > other > > > >>> possible sort orders, e.g., (C,B,A), will be ignored. " [1] > > > >>> > > > >>> Best, > > > >>> Stamatis > > > >>> > > > >>> [1] > > > >>> > > > > > > https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf > > > >>> > > > >>> On Fri, Oct 18, 2019 at 4:56 AM Haisheng Yuan < > > h.y...@alibaba-inc.com> > > > >>> wrote: > > > >>> > > > >>>> TL;DR > > > >>>> Both top-down physical TraitSet request and bottom-up TraitSet > > > >>>> derivation have their strongth and weakness, we propose > > > >>>> on-demand TraitSet request to combine the above two, to reduce > > > >>>> the number of plan alternatives that are genereated, especially > > > >>>> in distributed system. > > > >>>> > > > >>>> e.g. > > > >>>> select * from foo join bar on f1=b1 and f2=b2 and f3=b3; > > > >>>> > > > >>>> In non-distributed system, we can generate a sort merge join, > > > >>>> requesting foo sorted by f1,f2,f3 and bar sorted by b1,b2,b3. > > > >>>> But if foo happens to be sorted by f3,f2,f1, we may miss the > > > >>>> chance of making use of the delivered ordering of foo. Because > > > >>>> if we require bar to be sorted by b3,b2,b1, we don't need to > > > >>>> sort on foo anymore. There are so many choices, n!, not even > > > >>>> considering asc/desc and null direction. We can't request all > > > >>>> the possible traitsets in top-down way, and can't derive all the > > > >>>> possible traitsets in bottom-up way either. > > > >>>> > > > >>>> We propose on-demand traitset request by adding a new type > > > >>>> of metadata DerivedTraitSets into the built-in metadata system. > > > >>>> > > > >>>> List<RelTraitSet> deriveTraitSets(RelNode, RelMetadataQuery) > > > >>>> > > > >>>> In this metadata, every operator returns several possbile > traitsets > > > >>>> that may be derived from this operator. > > > >>>> > > > >>>> Using above query as an example, the tablescan on foo should > > > >>>> return traiset with collation on f3, f2, f1. > > > >>>> > > > >>>> In physical implementation rules, e.g. the SortMergeJoinRule, > > > >>>> it gets possible traitsets from both child operators, uses the > join > > > >>>> keys to eliminate useless traitsets, leaves out usefull traitsets, > > > >>>> and requests corresponding traitset on the other child. > > > >>>> > > > >>>> This relies on the feature of AbstractConverter, which is turned > > > >>>> off by default, due to performance issue [1]. > > > >>>> > > > >>>> Thoughts? > > > >>>> > > > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970 > > > >>>> > > > >>>> Haisheng > > > >>>> > > > >>>> > > > >>> > > > > > > > > > >