Hi Haisheng,

Great explanation, thanks.

One thing I'd like to cover in advance is trait propagation process (I need
it for Apache Ignite SQL engine implementation).

For example:

There are two nodes: Rel X and its child node Rel Y

Both nodes are in Optimized state, and there is a Logical rule for Rel Y in
a rules queue matched,

After logical rule is applied, there is a logical rel Rel Y' and containing
it set moves into Exploring state (since there is a logical node which has
to be implemented)

After whole process Exploring -> ... -> Optimized we need to force parent
Rel X set to switch its state from Optimized to Optimizing to peek the
newly implemented child and (possible) produce a new Rel X' physical rel on
the basis of previously requested from the Rel X set traits.

If a new Rel X' is produced, a Rel X' parent should move its state
Optimized -> Optimizing and repeat described above operations.

Does it look like true?

Regards,
Igor


вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <h.y...@alibaba-inc.com>:

> Hi,
>
> In the past few months, we have discussed a lot about Cascades style
> top-down optimization, including on-demand trait derivation/request, rule
> apply, branch and bound space pruning. Now we think it is time to move
> towards these targets.
>
> We will separate it into several small issues, and each one can be
> integrated as a standalone, independent feature, and most importantly,
> meanwhile keep backward compatibility.
>
> 1. Top-down trait request
> In other words, pass traits requirements from parent nodes to child nodes.
> The trait requests happens after all the logical transformation rules and
> physical implementation rules are done, in a top-down manner, driven from
> root set. e.g.:
> SELECT a, sum(c) FROM
>     (SELECT * FROM R JOIN S USING (a, b)) t
> GROUP BY a;
>
> Suppose we have the following plan tree in the MEMO, and let's only
> consider distribution for simplicity, each group represents a RelSet in the
> MEMO.
>
>    Group 1:     Agg on [a]
>    Group 2:           +-- MergeJoin on [a, b]
>    Group 3:                       |--- TableScan R
>    Group 4:                       +--- TableScan S
>    | Group No | Operator    | Derived Traits | Required Traits |
>    | ----------- | ------------- | --------------- | --------------- |
>    | Group 1  | Aggregate    | Hash[a]         | N/A                |
>    | Group 2  | MergeJoin    | Hash[a,b]      | Hash[a]          |
>    | Group 3  | TableScan R | None            | Hash[a,b]       |
>    | Group 4  | TableScan S | None            | Hash[a,b]       |
>
> We will add new interface PhysicalNode (or extending RelNode) with
> methods:
>
> - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet required);
>   pair.left is the current operator's new traitset, pair.right is the list
> of children's required traitset.
>
> - RelNode passThrough(RelTraitSet required);
>   Default implementation will call above method requireTraits() and
> RelNode.copy() to create new RelNode. Available to be overriden for
> physical operators to customize the logic.
>
> The planner will call above method on MergeJoin operator to pass the
> required traits (Hash[a]) to Mergejoin's child operators.
>
> We will get a new MergeJoin:
>  MergeJoin hash[a]
>           |---- TableScan R hash[a] (RelSubset)
>           +---- TableScan S hash[a] (RelSubset)
>
> Now the MEMO group looks like:
>    | Group No | Operator    | Derived Traits       | Required Traits      |
>    | ---------- | -------- ----- | -------------------- |
> --------------------- |
>    | Group1   | Aggregate   | Hash[a]                 | N/A
>         |
>    | Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> |
>    | Group3   | TableScan R | None                    | Hash[a,b], Hash[a]
> |
>    | Group4   | TableScan S | None                    | Hash[a,b], Hash[a]
> |
>
> Calcite user may choose to ignore / not implement the interface to keep
> the original behavior. Each physical operator, according to its own logic,
> decides whether passThrough() should pass traits down or not by returning:
> - a non-null RelNode, which means it can pass down
> - null object, which means can't pass down
>
> 2. Provide option to disable AbstractConverter
> Once the plan can request traits in top-down way in the framework, many
> system don't need AbstractConverter anymore, since it is just a
> intermediate operator to generate physical sort / exchange. For those, we
> can provide option to disable AbstractConverter, generate physical enforcer
> directly by adding a method to interface Convention:
> - RelNode enforce(RelNode node, RelTraitSet traits);
>
> The default implementation may just calling changeTraitsUsingConverters(),
> but people may choose to override it if the system has special needs, like
> several traits must implement together, or the position of collation in
> RelTraitSet is before distribution.
>
> 3. Top-down driven, on-demand rule apply
> For every RelNode in a RelSet, rule is matched and applied sequentially,
> newly generated RelNodes are added to the end of RelNode list in the RelSet
> waiting for rule apply. RuleQueue and DeferringRuleCall is not needed
> anymore. This will make space pruning and rule mutual exclusivity check
> possible.
>
> There are 3 stages for each RelSet:
> a). Exploration: logical transformation, yields logical nodes
> b). Implementation: physical transformation, yields physical nodes
> c). Optimization: trait request, enforcement
>
> The general process looks like:
> - optimize RelSet X:
> implement RelSet X
>     for each physical relnode in RelSet X:
>         // pass down trait requests to child RelSets
>         for each child RelSet Y of current relnode:
>             optimize RelSet Y
>
> - implement RelSet X:
>     if X has been implemented:
>         return
> explore RelSet X
>     for each relnode in RelSet X:
>         apply physical rules
> - explore RelSet X:
>      if X has been explored
>         return
> for each relnode in RelSet X:
>         // ensure each child RelSet of current relnode is explored
> for each child RelSet Y of current relnode:
>            explore RelSet Y
>         apply logical rules on current relnode
>
> Basically it is a state machine with several states: Initialized,
> Explored, Exploring, Implemented, Implementing, Optimized, Optimizing and
> several transition methods: exploreRelSet, exploreRelNode, implementRelSet,
> implementRelNode, optimizeRelSet, optimizeRelNode...
>
> To achieve this, we need to mark the rules either logical rule or physical
> rule.
> To keep backward compatibility, all the un-marked rules will be treated as
> logical rules, except rules that uses AbstractConverter as rule operand,
> these rules still need to applied top-down, or random order.
>
>
> 4. On-demand, bottom-up trait derivation
> It is called bottom-up, but actually driven by top-down, happens same time
> as top-down trait request, in optimization stage mentioned above. Many
> Calcite based bigdata system only propagate traits on Project and Filter by
> writing rules, which is very limited. In fact, we can extend trait
> propagation/derivation to all the operators, without rules, by adding
> interface PhysicalNode (or extending RelNode) with method:
> - RelNode derive(RelTraitSet traits, int childId);
>
> Given the following plan (only consider distribution for simplicity):
>    Agg [a,b]
>       +-- MergeJoin [a]
>                  |---- TableScan R
>                  +--- TableScan S
>
> Hash[a] won't satisfy Hash[a,b] without special treatment, because there
> isn't a mechanism to coordinate traits between children.
>
> Now we call derive method on Agg [a,b] node, derive(Hash[a], 0), we get
> the new node:
> Agg [a]
>   +-- MergeJoin [a] (RelSubset)
>
> We will provide different matching type, so each operator can specify what
> kind of matching type it requires its children:
> - MatchType getMatchType(RelTrait trait, int childId);
>
> a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy
> b) Partial: Hash[a] partial match Hash[a,b]
> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a]
>
> In addition, optimization order is provided for each opertor to specify:
> a) left to right
> b) right to left
> c) both
>
> For example, for query SELECT * FROM R join S on R.a=S.a and R.b=S.b and
> R.c=S.c:
> Suppose R is distributed by a,b, and S is distributed by c.
>  MergeJoin [a,b,c]
>        |--- TableScan R [a,b]
>        +-- TableScan S [c]
> a) left to right, call derive(Hash[a,b], 0), we get MergeJoin [a,b]
> b) right to left, call derive(Hash[c], 1), we get MergeJoin [c], most
> likely a bad plan
> c) both, get above 2 plans.
>
> For system that doesn't have a fine-tuned stats and cost model, it may not
> be able to make a right decision based purely on cost. Probably we need to
> provide the MergeJoin with both children's derived traitset list.
> - List<RelNode> derive(List<List<RelTraitSet>>);
>
> Of course, all above methods are optional to implement for those who
> doesn't need this feature.
>
> 5. Branch and Bound Space Pruning
> After we implement on-demand, top-down trait enforcement and rule-apply,
> we can pass the cost limit at the time of passing down required traits, as
> described in the classical Cascades paper. Right now, Calcite doesn't
> provide group level logical properties, including stats info, each operator
> in the same group has its own logical property and the stats may vary, so
> we can only do limited space pruning for trait enforcement, still good. But
> if we agree to add option to share group level stats between relnodes in a
> RelSet, we will be able to do more aggresive space pruning, which will help
> boost the performance of join reorder planning.
>
>
> With all that being said, how do we move forward?
>
> There are 2 ways:
> a) Modify on current VolcanoPlanner.
>   Pros: code reuse, existing numerous test cases and infrastructure, fast
> integration
>   Cons: changing code always brings risk
>
> b) Add a new XXXXPlanner
>   Pros: no risk, no tech debt, no need to worry about backward
> compatability
>   Cons: separate test cases for new planner, one more planner to maintain
>
> We'd like to hear the community's thoughts and advices.
>
> Thanks.
> - Haisheng
>
>
>
>
>
>
>

Reply via email to