Re: [DISCUSS] Towards Cascades Optimizer

Haisheng Yuan Mon, 20 Apr 2020 18:34:40 -0700

Hi Hanumath,

The trait in the example is for distribution only for brevity, not including 
collation. No matter it is hash join or merge join or nestedloop join, the same 
distribution applied.


> Are you planning to use the same interface as that of VolcanoPlanner?
Yes, not only for compatibility, but also because the design of RelNode and 
Rule plays a huge impact on how planner can be refactored. They are integrated 
components, not really separable. I have no choice.

> The concern about the optimizer can take up huge time for optimizing large 
> queries is a genuine. 
Because very few efforts were spent on improving speed of the core framework 
for the last 10 years. Most of queries(MV putting side), as long as there is no 
dynamic programming style join reordering, should be optimized in a reasonable 
span of time, otherwise it is a **bug**.  Alibaba MaxCompute processed millions 
of queries per day, they are mission-critical and time-critical, all have 
optimization and execution time limit. We spent a lot of efforts on improving 
optimization latency, all the changes we made on Calcite will be battlefield 
tested. The inefficiency issue caused by inefficient search strategy and lack 
of trait propagation mechanism, that is experienced by other Calcite based 
systems, will be resolved by this proposal. If you genuinely want 
iteration-based optimization, you will be disappointed. Because even Cascades' 
first adopter SQL Server, doesn't provide iteration-based optimization, instead 
it has a phase with very limited optimization rules for transactional 
processing. You can't have your cake and eat it.

I am inclined to modify on VolcanoPlanner. A separate new planner won't make it 
move fast and lower online production risk. No body knows the pain on 
maintaining 2 optimizers more than me. Ask Greenplum database developers how 
they feel about working on Orca and Postres planner.

Haisheng

On 2020/04/20 18:07:06, hanu mapr <hanu.m...@gmail.com> wrote: 
> Hello Haisheng,
> 
> 
> Thanks for the detailed analysis on the support for cascades framework. I
> am quite interested to be part of the new optimization framework. I believe
> this a very important infrastructural work to make calcite a robust query
> optimizer.
> 
> 
> I like your approach on the trait propagation and derivation of the traits
> from child nodes. I have a question though in the example you provided.
> Shouldn't HashJoin make more sense in place of MergeJoin as it can
> passThrough the traits like Hash whereas MergeJoin needs the sorted
> traits.(Please correct me if I am missing anything here).
> 
> 
> How are you imagining the new Optimizer interface from user point of view.
> Are you planning to use the same interface as that of VolcanoPlanner?
> 
> 
> The concern about the optimizer can take up huge time for optimizing large
> queries is a genuine. At the time when I was part of writing an optimizer
> for a federated engine, I also dealt with it by making the search
> time-bound. As mentioned even iteration based approach might be viable but
> I think a little more thought might be needed to finalize.
> 
> 
> I am of the opinion that option b add a new XXXXPlanner might be the right
> approach. We can continuously work on the new optimizer and make it robust
> so that switching to the new optimizer can be seamless and can be
> controlled by the user. From my experience any bug in the optimizer is
> quite subtle and can have high magnitude degradation in performance, so
> tweaking the existing VolcanoOptimizer can be risky.
> 
> 
> Thanks,
> 
> -Hanumath Maduri
> 
> On Mon, Apr 20, 2020 at 11:04 AM Xiening Dai <xndai....@gmail.com> wrote:
> 
> > Hi Jinpeng,
> >
> > Regarding this comment - I believe there are ways to calculate the logical
> > cost, but I think it’s not that simple as  "cardinality * unit_copy_cost.”,
> > would  you provide more details of other different ways? Just the algorithm
> > description or pseudo code would help us understand. Thanks.
> >
> > Regarding the approach of creating new planner, I don’t think a new
> > planner would lower the risk. We don’t know what we don’t know. If we
> > introduced an issue while modifying the planner, most likely we would do
> > the same with new planner class. A new planner doesn’t necessarily prevent
> > the issue from happening, but just delay its surfacing, which is worse IMO.
> >
> > There’s one obvious benefit with new planner is that we can provide some
> > sort of isolation so the change won’t cause test baseline updates, which
> > could be painful at times.  We should see if we could use planner config to
> > achieve the same if we decide to just modify the current planner.
> >
> >
> > > On Apr 20, 2020, at 8:32 AM, Haisheng Yuan <hy...@apache.org> wrote:
> > >
> > > Igor,
> > >
> > > That's great.
> > >
> > > On 2020/04/20 11:17:49, Seliverstov Igor <gvvinbl...@gmail.com> wrote:
> > >> Haisheng, Xiening,
> > >>
> > >> Ok, Now I see how it should work.
> > >>
> > >> Thanks for your replies.
> > >>
> > >> Regards,
> > >> Igor
> > >>
> > >>> 20 апр. 2020 г., в 09:56, Seliverstov Igor <gvvinbl...@gmail.com>
> > написал(а):
> > >>>
> > >>> Haisheng, Xiening,
> > >>>
> > >>> Thanks for clarifying.
> > >>>
> > >>> In this proposal, we are not trying to split logical and physical
> > planning entirely. - actually I was in doubt about an idea of entire
> > splitting logical and physical phases, if you aren't going to, I have no
> > objections.
> > >>>
> > >>> But it returns me to my first question: how we will propagate traits
> > in bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add
> > API to force rules matching specific rels for details)
> > >>>
> > >>> One of inconveniences of current VolcanoPlanner implementation is
> > amount of tricks that we need to get desired behaviour. It would be great
> > if some of issues (or all of them) were solved in the new approach.
> > >>>
> > >>> Regards,
> > >>> Igor
> > >>>
> > >>> пн, 20 апр. 2020 г., 7:02 Xiening Dai <xndai....@gmail.com <mailto:
> > xndai....@gmail.com>>:
> > >>> Hi Igor,
> > >>>
> > >>> Your comment - "because actual cost may be calculated correctly using
> > physical operators only. So won't be able to implement Branch and Bound
> > Space Pruning.“ is actually not true. In Cascade’s lower bound / upper
> > bound pruning algorithm, you can get cost lower bound of input RelNode
> > using cardinality * unit_copy_cost. The unit_copy_cost is a constant, which
> > stands for the minimal cost for processing a tuple from the input. So this
> > will be the minimal cost the input RelNode can achieve, and if this is
> > indeed larger than the current best cost, this input node can be pruned.
> > >>>
> > >>> In this proposal, we are not trying to split logical and physical
> > planning entirely. But for any given equivalent set, we would need to
> > finish exploration before implementation. But for the entire memo tree,
> > each set could be in different planning stage, and an equivalent set can be
> > pruned even before it’s implemented. A text book example of aforementioned
> > “lower bound / upper bound pruning” would be the join ordering case.
> > >>>
> > >>> Regarding #3, I think we can still achieve that (partially) through
> > this proposal. Remember every time when we start the optimization, we pass
> > down an upper bound limit. Initially this upper bound for root is infinite
> > - meaning that no feasible plan is available. Every time when we find a
> > physical plan we update this upper bound, then start the search again. We
> > could stop the search when the cost is less than a pre-defined threshold -
> > which gives you a “good enough” plan with early termination. Still this
> > wouldn't avoid the logical exploration. For that, you would probably
> > archive through rule configurations, and avoid some expansive
> > transformation to keep the cost down.
> > >>>
> > >>>
> > >>>> On Apr 19, 2020, at 7:30 PM, Haisheng Yuan <hy...@apache.org <mailto:
> > hy...@apache.org>> wrote:
> > >>>>
> > >>>> Igor,
> > >>>>
> > >>>> a) Given current Calcite's stats derivation strategy, mixing logical
> > and physical planning won't make it better. I hope you went through my
> > email to the end, currently operators inside a memo group don't share stats
> > info, each operator's stats may differ with the other one, and the
> > RelSubset's best node and stats may vary during optimization. So logical
> > and physical rule pruning are not safe at the moment, otherwise it almost
> > implies changing downstream project's cost model silently.
> > >>>>
> > >>>> On the other hand, ensuring child group exploration task finish first
> > will make rule mutual exclusivity check possible, like the result of
> > ReduceExpressionRule won't need trigger the same rule again, The join
> > result of JoinCommuteRule won't need trigger JoinCommuteRule and
> > ReduceExpressionRule again.
> > >>>>
> > >>>> More importantly, most if not all the the long planning queries in
> > Calcite are not caused by too many alternatives, but mutual triggering, or
> > cyclic triggering, which can be avoided by customizing the rules instead of
> > using the default one. Unless you use dynamic programming (Calcite use
> > heuristic) on join reordering (left-deep, right-deep, bushy), space pruning
> > won't help make long / infinite running query faster.
> > >>>>
> > >>>> b) No evidence shows current version of Calcite will return the most
> > promising plan in first planning iteration. Instead of praying for getting
> > good enough plan in the first iteration, why not focus on fixing rules that
> > causes the issue?
> > >>>>
> > >>>> c) That is not the goal.
> > >>>>
> > >>>> On 2020/04/19 15:14:57, Seliverstov Igor <gvvinbl...@gmail.com
> > <mailto:gvvinbl...@gmail.com>> wrote:
> > >>>>> Haisheng,
> > >>>>>
> > >>>>> From my point of view splitting logical and physical planning parts
> > isn’t a good idea.
> > >>>>>
> > >>>>> I think so because actual cost may be calculated correctly using
> > physical operators only. So that we:
> > >>>>> a) won't be able to implement Branch and Bound Space Pruning (as far
> > as I understand, at exploring time there are no physical operators, no
> > correct costs, but assumptions only, I don’t think we should rely on them)
> > >>>>> b) won’t be able to get good enough plan (without Branch and Bound
> > Space Pruning it’s non-trivial task to get right search direction and most
> > promising plans in first planning iterations)
> > >>>>> c) won’t be able to tune the good enough plan during several similar
> > queries are executed
> > >>>>>
> > >>>>> Regards,
> > >>>>> Igor
> > >>>>>
> > >>>>>
> > >>>>>> 19 апр. 2020 г., в 17:37, Haisheng Yuan <hy...@apache.org <mailto:
> > hy...@apache.org>> написал(а):
> > >>>>>>
> > >>>>>> Hi Igor,
> > >>>>>>
> > >>>>>> You can't have your cake and eat it.
> > >>>>>> But one feasible work item definitely we can do is that once
> > timeout, stop exploring, use the first available physical operator in each
> > group and optimize it.
> > >>>>>>
> > >>>>>> Because most, if not all, of the long / infinite running
> > optimizations are caused by project related transpose, join reordering
> > (join commute + associative), constant reduction for large expression (see
> > CALCITE-3460), all of which are logical transformations rules and many of
> > which have corresponding JIRAs. So another alternative is, let's fix these
> > bugs to improve Calcite to make timeout option less usable.
> > >>>>>>
> > >>>>>> Another thing worth noting is that sql server optimizer timeout
> > mechanism is not based on clock time, but the number of optimization tasks
> > it has done [1].
> > >>>>>>
> > >>>>>> [1]
> > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> > <
> > https://techcommunity.microsoft.com/t5/sql-server-support/understanding-optimizer-timeout-and-how-complex-queries-can-be/ba-p/319188
> > >
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Haisheng
> > >>>>>>
> > >>>>>> On 2020/04/19 11:31:27, Seliverstov Igor <gvvinbl...@gmail.com
> > <mailto:gvvinbl...@gmail.com>> wrote:
> > >>>>>>> Haisheng,
> > >>>>>>>
> > >>>>>>> Ok, then such notification isn't needed
> > >>>>>>>
> > >>>>>>> But in this case we don't have any control over how long planning
> > takes
> > >>>>>>>
> > >>>>>>> For some systems it's necessary to get good enough plan right now
> > instead
> > >>>>>>> of best one after while
> > >>>>>>>
> > >>>>>>> For example we've been considering a case when a query is
> > optimised several
> > >>>>>>> times in short iterations in case it's impossible to get the best
> > plan in
> > >>>>>>> reasonable period of time (for example there is an SLA for
> > response time)
> > >>>>>>>
> > >>>>>>> This mean we need all needed physical implementations after each
> > logical
> > >>>>>>> transformation is applied.
> > >>>>>>>
> > >>>>>>> Regards,
> > >>>>>>> Igor
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> вс, 19 апр. 2020 г., 13:55 Haisheng Yuan <hy...@apache.org
> > <mailto:hy...@apache.org>>:
> > >>>>>>>
> > >>>>>>>> Hi Igor,
> > >>>>>>>>
> > >>>>>>>> There will be no rule queue anymore. Y will be fully explored
> > (logical
> > >>>>>>>> rules are matched and applied) before it can be implemented and
> > optimized.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Haisheng
> > >>>>>>>>
> > >>>>>>>> On 2020/04/19 10:11:45, Seliverstov Igor <gvvinbl...@gmail.com
> > <mailto:gvvinbl...@gmail.com>> wrote:
> > >>>>>>>>> Hi Haisheng,
> > >>>>>>>>>
> > >>>>>>>>> Great explanation, thanks.
> > >>>>>>>>>
> > >>>>>>>>> One thing I'd like to cover in advance is trait propagation
> > process (I
> > >>>>>>>> need
> > >>>>>>>>> it for Apache Ignite SQL engine implementation).
> > >>>>>>>>>
> > >>>>>>>>> For example:
> > >>>>>>>>>
> > >>>>>>>>> There are two nodes: Rel X and its child node Rel Y
> > >>>>>>>>>
> > >>>>>>>>> Both nodes are in Optimized state, and there is a Logical rule
> > for Rel Y
> > >>>>>>>> in
> > >>>>>>>>> a rules queue matched,
> > >>>>>>>>>
> > >>>>>>>>> After logical rule is applied, there is a logical rel Rel Y' and
> > >>>>>>>> containing
> > >>>>>>>>> it set moves into Exploring state (since there is a logical node
> > which
> > >>>>>>>> has
> > >>>>>>>>> to be implemented)
> > >>>>>>>>>
> > >>>>>>>>> After whole process Exploring -> ... -> Optimized we need to
> > force parent
> > >>>>>>>>> Rel X set to switch its state from Optimized to Optimizing to
> > peek the
> > >>>>>>>>> newly implemented child and (possible) produce a new Rel X'
> > physical rel
> > >>>>>>>> on
> > >>>>>>>>> the basis of previously requested from the Rel X set traits.
> > >>>>>>>>>
> > >>>>>>>>> If a new Rel X' is produced, a Rel X' parent should move its
> > state
> > >>>>>>>>> Optimized -> Optimizing and repeat described above operations.
> > >>>>>>>>>
> > >>>>>>>>> Does it look like true?
> > >>>>>>>>>
> > >>>>>>>>> Regards,
> > >>>>>>>>> Igor
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> вс, 19 апр. 2020 г., 6:52 Haisheng Yuan <h.y...@alibaba-inc.com
> > <mailto:h.y...@alibaba-inc.com>>:
> > >>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> In the past few months, we have discussed a lot about Cascades
> > style
> > >>>>>>>>>> top-down optimization, including on-demand trait
> > derivation/request,
> > >>>>>>>> rule
> > >>>>>>>>>> apply, branch and bound space pruning. Now we think it is time
> > to move
> > >>>>>>>>>> towards these targets.
> > >>>>>>>>>>
> > >>>>>>>>>> We will separate it into several small issues, and each one can
> > be
> > >>>>>>>>>> integrated as a standalone, independent feature, and most
> > importantly,
> > >>>>>>>>>> meanwhile keep backward compatibility.
> > >>>>>>>>>>
> > >>>>>>>>>> 1. Top-down trait request
> > >>>>>>>>>> In other words, pass traits requirements from parent nodes to
> > child
> > >>>>>>>> nodes.
> > >>>>>>>>>> The trait requests happens after all the logical transformation
> > rules
> > >>>>>>>> and
> > >>>>>>>>>> physical implementation rules are done, in a top-down manner,
> > driven
> > >>>>>>>> from
> > >>>>>>>>>> root set. e.g.:
> > >>>>>>>>>> SELECT a, sum(c) FROM
> > >>>>>>>>>>  (SELECT * FROM R JOIN S USING (a, b)) t
> > >>>>>>>>>> GROUP BY a;
> > >>>>>>>>>>
> > >>>>>>>>>> Suppose we have the following plan tree in the MEMO, and let's
> > only
> > >>>>>>>>>> consider distribution for simplicity, each group represents a
> > RelSet
> > >>>>>>>> in the
> > >>>>>>>>>> MEMO.
> > >>>>>>>>>>
> > >>>>>>>>>> Group 1:     Agg on [a]
> > >>>>>>>>>> Group 2:           +-- MergeJoin on [a, b]
> > >>>>>>>>>> Group 3:                       |--- TableScan R
> > >>>>>>>>>> Group 4:                       +--- TableScan S
> > >>>>>>>>>> | Group No | Operator    | Derived Traits | Required Traits |
> > >>>>>>>>>> | ----------- | ------------- | --------------- |
> > --------------- |
> > >>>>>>>>>> | Group 1  | Aggregate    | Hash[a]         | N/A
> >   |
> > >>>>>>>>>> | Group 2  | MergeJoin    | Hash[a,b]      | Hash[a]          |
> > >>>>>>>>>> | Group 3  | TableScan R | None            | Hash[a,b]       |
> > >>>>>>>>>> | Group 4  | TableScan S | None            | Hash[a,b]       |
> > >>>>>>>>>>
> > >>>>>>>>>> We will add new interface PhysicalNode (or extending RelNode)
> > with
> > >>>>>>>>>> methods:
> > >>>>>>>>>>
> > >>>>>>>>>> - Pair<RelTraitSet,List<RelTraitSet>> requireTraits(RelTraitSet
> > >>>>>>>> required);
> > >>>>>>>>>> pair.left is the current operator's new traitset, pair.right is
> > the
> > >>>>>>>> list
> > >>>>>>>>>> of children's required traitset.
> > >>>>>>>>>>
> > >>>>>>>>>> - RelNode passThrough(RelTraitSet required);
> > >>>>>>>>>> Default implementation will call above method requireTraits()
> > and
> > >>>>>>>>>> RelNode.copy() to create new RelNode. Available to be overriden
> > for
> > >>>>>>>>>> physical operators to customize the logic.
> > >>>>>>>>>>
> > >>>>>>>>>> The planner will call above method on MergeJoin operator to
> > pass the
> > >>>>>>>>>> required traits (Hash[a]) to Mergejoin's child operators.
> > >>>>>>>>>>
> > >>>>>>>>>> We will get a new MergeJoin:
> > >>>>>>>>>> MergeJoin hash[a]
> > >>>>>>>>>>        |---- TableScan R hash[a] (RelSubset)
> > >>>>>>>>>>        +---- TableScan S hash[a] (RelSubset)
> > >>>>>>>>>>
> > >>>>>>>>>> Now the MEMO group looks like:
> > >>>>>>>>>> | Group No | Operator    | Derived Traits       | Required
> > Traits
> > >>>>>>>>  |
> > >>>>>>>>>> | ---------- | -------- ----- | -------------------- |
> > >>>>>>>>>> --------------------- |
> > >>>>>>>>>> | Group1   | Aggregate   | Hash[a]                 | N/A
> > >>>>>>>>>>      |
> > >>>>>>>>>> | Group2   | MergeJoin   | Hash[a,b], Hash[a]| Hash[a]
> > >>>>>>>>>> |
> > >>>>>>>>>> | Group3   | TableScan R | None                    | Hash[a,b],
> > >>>>>>>> Hash[a]
> > >>>>>>>>>> |
> > >>>>>>>>>> | Group4   | TableScan S | None                    | Hash[a,b],
> > >>>>>>>> Hash[a]
> > >>>>>>>>>> |
> > >>>>>>>>>>
> > >>>>>>>>>> Calcite user may choose to ignore / not implement the interface
> > to keep
> > >>>>>>>>>> the original behavior. Each physical operator, according to its
> > own
> > >>>>>>>> logic,
> > >>>>>>>>>> decides whether passThrough() should pass traits down or not by
> > >>>>>>>> returning:
> > >>>>>>>>>> - a non-null RelNode, which means it can pass down
> > >>>>>>>>>> - null object, which means can't pass down
> > >>>>>>>>>>
> > >>>>>>>>>> 2. Provide option to disable AbstractConverter
> > >>>>>>>>>> Once the plan can request traits in top-down way in the
> > framework, many
> > >>>>>>>>>> system don't need AbstractConverter anymore, since it is just a
> > >>>>>>>>>> intermediate operator to generate physical sort / exchange. For
> > those,
> > >>>>>>>> we
> > >>>>>>>>>> can provide option to disable AbstractConverter, generate
> > physical
> > >>>>>>>> enforcer
> > >>>>>>>>>> directly by adding a method to interface Convention:
> > >>>>>>>>>> - RelNode enforce(RelNode node, RelTraitSet traits);
> > >>>>>>>>>>
> > >>>>>>>>>> The default implementation may just calling
> > >>>>>>>> changeTraitsUsingConverters(),
> > >>>>>>>>>> but people may choose to override it if the system has special
> > needs,
> > >>>>>>>> like
> > >>>>>>>>>> several traits must implement together, or the position of
> > collation in
> > >>>>>>>>>> RelTraitSet is before distribution.
> > >>>>>>>>>>
> > >>>>>>>>>> 3. Top-down driven, on-demand rule apply
> > >>>>>>>>>> For every RelNode in a RelSet, rule is matched and applied
> > >>>>>>>> sequentially,
> > >>>>>>>>>> newly generated RelNodes are added to the end of RelNode list
> > in the
> > >>>>>>>> RelSet
> > >>>>>>>>>> waiting for rule apply. RuleQueue and DeferringRuleCall is not
> > needed
> > >>>>>>>>>> anymore. This will make space pruning and rule mutual
> > exclusivity check
> > >>>>>>>>>> possible.
> > >>>>>>>>>>
> > >>>>>>>>>> There are 3 stages for each RelSet:
> > >>>>>>>>>> a). Exploration: logical transformation, yields logical nodes
> > >>>>>>>>>> b). Implementation: physical transformation, yields physical
> > nodes
> > >>>>>>>>>> c). Optimization: trait request, enforcement
> > >>>>>>>>>>
> > >>>>>>>>>> The general process looks like:
> > >>>>>>>>>> - optimize RelSet X:
> > >>>>>>>>>> implement RelSet X
> > >>>>>>>>>>  for each physical relnode in RelSet X:
> > >>>>>>>>>>      // pass down trait requests to child RelSets
> > >>>>>>>>>>      for each child RelSet Y of current relnode:
> > >>>>>>>>>>          optimize RelSet Y
> > >>>>>>>>>>
> > >>>>>>>>>> - implement RelSet X:
> > >>>>>>>>>>  if X has been implemented:
> > >>>>>>>>>>      return
> > >>>>>>>>>> explore RelSet X
> > >>>>>>>>>>  for each relnode in RelSet X:
> > >>>>>>>>>>      apply physical rules
> > >>>>>>>>>> - explore RelSet X:
> > >>>>>>>>>>   if X has been explored
> > >>>>>>>>>>      return
> > >>>>>>>>>> for each relnode in RelSet X:
> > >>>>>>>>>>      // ensure each child RelSet of current relnode is explored
> > >>>>>>>>>> for each child RelSet Y of current relnode:
> > >>>>>>>>>>         explore RelSet Y
> > >>>>>>>>>>      apply logical rules on current relnode
> > >>>>>>>>>>
> > >>>>>>>>>> Basically it is a state machine with several states:
> > Initialized,
> > >>>>>>>>>> Explored, Exploring, Implemented, Implementing, Optimized,
> > Optimizing
> > >>>>>>>> and
> > >>>>>>>>>> several transition methods: exploreRelSet, exploreRelNode,
> > >>>>>>>> implementRelSet,
> > >>>>>>>>>> implementRelNode, optimizeRelSet, optimizeRelNode...
> > >>>>>>>>>>
> > >>>>>>>>>> To achieve this, we need to mark the rules either logical rule
> > or
> > >>>>>>>> physical
> > >>>>>>>>>> rule.
> > >>>>>>>>>> To keep backward compatibility, all the un-marked rules will be
> > >>>>>>>> treated as
> > >>>>>>>>>> logical rules, except rules that uses AbstractConverter as rule
> > >>>>>>>> operand,
> > >>>>>>>>>> these rules still need to applied top-down, or random order.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> 4. On-demand, bottom-up trait derivation
> > >>>>>>>>>> It is called bottom-up, but actually driven by top-down,
> > happens same
> > >>>>>>>> time
> > >>>>>>>>>> as top-down trait request, in optimization stage mentioned
> > above. Many
> > >>>>>>>>>> Calcite based bigdata system only propagate traits on Project
> > and
> > >>>>>>>> Filter by
> > >>>>>>>>>> writing rules, which is very limited. In fact, we can extend
> > trait
> > >>>>>>>>>> propagation/derivation to all the operators, without rules, by
> > adding
> > >>>>>>>>>> interface PhysicalNode (or extending RelNode) with method:
> > >>>>>>>>>> - RelNode derive(RelTraitSet traits, int childId);
> > >>>>>>>>>>
> > >>>>>>>>>> Given the following plan (only consider distribution for
> > simplicity):
> > >>>>>>>>>> Agg [a,b]
> > >>>>>>>>>>    +-- MergeJoin [a]
> > >>>>>>>>>>               |---- TableScan R
> > >>>>>>>>>>               +--- TableScan S
> > >>>>>>>>>>
> > >>>>>>>>>> Hash[a] won't satisfy Hash[a,b] without special treatment,
> > because
> > >>>>>>>> there
> > >>>>>>>>>> isn't a mechanism to coordinate traits between children.
> > >>>>>>>>>>
> > >>>>>>>>>> Now we call derive method on Agg [a,b] node, derive(Hash[a],
> > 0), we get
> > >>>>>>>>>> the new node:
> > >>>>>>>>>> Agg [a]
> > >>>>>>>>>> +-- MergeJoin [a] (RelSubset)
> > >>>>>>>>>>
> > >>>>>>>>>> We will provide different matching type, so each operator can
> > specify
> > >>>>>>>> what
> > >>>>>>>>>> kind of matching type it requires its children:
> > >>>>>>>>>> - MatchType getMatchType(RelTrait trait, int childId);
> > >>>>>>>>>>
> > >>>>>>>>>> a) Exact: Hash[a,b] exact match Hash[a,b], aka, satisfy
> > >>>>>>>>>> b) Partial: Hash[a] partial match Hash[a,b]
> > >>>>>>>>>> c) Permuted: Sort[a,b,c] permuted match Sort[c,b,a]
> > >>>>>>>>>>
> > >>>>>>>>>> In addition, optimization order is provided for each opertor to
> > >>>>>>>> specify:
> > >>>>>>>>>> a) left to right
> > >>>>>>>>>> b) right to left
> > >>>>>>>>>> c) both
> > >>>>>>>>>>
> > >>>>>>>>>> For example, for query SELECT * FROM R join S on R.a=S.a and
> > R.b=S.b
> > >>>>>>>> and
> > >>>>>>>>>> R.c=S.c:
> > >>>>>>>>>> Suppose R is distributed by a,b, and S is distributed by c.
> > >>>>>>>>>> MergeJoin [a,b,c]
> > >>>>>>>>>>     |--- TableScan R [a,b]
> > >>>>>>>>>>     +-- TableScan S [c]
> > >>>>>>>>>> a) left to right, call derive(Hash[a,b], 0), we get MergeJoin
> > [a,b]
> > >>>>>>>>>> b) right to left, call derive(Hash[c], 1), we get MergeJoin
> > [c], most
> > >>>>>>>>>> likely a bad plan
> > >>>>>>>>>> c) both, get above 2 plans.
> > >>>>>>>>>>
> > >>>>>>>>>> For system that doesn't have a fine-tuned stats and cost model,
> > it may
> > >>>>>>>> not
> > >>>>>>>>>> be able to make a right decision based purely on cost. Probably
> > we
> > >>>>>>>> need to
> > >>>>>>>>>> provide the MergeJoin with both children's derived traitset
> > list.
> > >>>>>>>>>> - List<RelNode> derive(List<List<RelTraitSet>>);
> > >>>>>>>>>>
> > >>>>>>>>>> Of course, all above methods are optional to implement for
> > those who
> > >>>>>>>>>> doesn't need this feature.
> > >>>>>>>>>>
> > >>>>>>>>>> 5. Branch and Bound Space Pruning
> > >>>>>>>>>> After we implement on-demand, top-down trait enforcement and
> > >>>>>>>> rule-apply,
> > >>>>>>>>>> we can pass the cost limit at the time of passing down required
> > >>>>>>>> traits, as
> > >>>>>>>>>> described in the classical Cascades paper. Right now, Calcite
> > doesn't
> > >>>>>>>>>> provide group level logical properties, including stats info,
> > each
> > >>>>>>>> operator
> > >>>>>>>>>> in the same group has its own logical property and the stats
> > may vary,
> > >>>>>>>> so
> > >>>>>>>>>> we can only do limited space pruning for trait enforcement,
> > still
> > >>>>>>>> good. But
> > >>>>>>>>>> if we agree to add option to share group level stats between
> > relnodes
> > >>>>>>>> in a
> > >>>>>>>>>> RelSet, we will be able to do more aggresive space pruning,
> > which will
> > >>>>>>>> help
> > >>>>>>>>>> boost the performance of join reorder planning.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> With all that being said, how do we move forward?
> > >>>>>>>>>>
> > >>>>>>>>>> There are 2 ways:
> > >>>>>>>>>> a) Modify on current VolcanoPlanner.
> > >>>>>>>>>> Pros: code reuse, existing numerous test cases and
> > infrastructure,
> > >>>>>>>> fast
> > >>>>>>>>>> integration
> > >>>>>>>>>> Cons: changing code always brings risk
> > >>>>>>>>>>
> > >>>>>>>>>> b) Add a new XXXXPlanner
> > >>>>>>>>>> Pros: no risk, no tech debt, no need to worry about backward
> > >>>>>>>>>> compatability
> > >>>>>>>>>> Cons: separate test cases for new planner, one more planner to
> > >>>>>>>> maintain
> > >>>>>>>>>>
> > >>>>>>>>>> We'd like to hear the community's thoughts and advices.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks.
> > >>>>>>>>>> - Haisheng
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] Towards Cascades Optimizer

Reply via email to