Hello Haisheng,
Thanks for the detailed analysis on the support for cascades framework. I
am quite interested to be part of the new optimization framework. I believe
this a very important infrastructural work to make calcite a robust query
optimizer.
I like your approach on the trait propagation and derivation of the traits
from child nodes. I have a question though in the example you provided.
Shouldn't HashJoin make more sense in place of MergeJoin as it can
passThrough the traits like Hash whereas MergeJoin needs the sorted
traits.(Please correct me if I am missing anything here).
How are you imagining the new Optimizer interface from user point of view.
Are you planning to use the same interface as that of VolcanoPlanner?
The concern about the optimizer can take up huge time for optimizing large
queries is a genuine. At the time when I was part of writing an optimizer
for a federated engine, I also dealt with it by making the search
time-bound. As mentioned even iteration based approach might be viable but
I think a little more thought might be needed to finalize.
I am of the opinion that option b add a new Planner might be the right
approach. We can continuously work on the new optimizer and make it robust
so that switching to the new optimizer can be seamless and can be
controlled by the user. From my experience any bug in the optimizer is
quite subtle and can have high magnitude degradation in performance, so
tweaking the existing VolcanoOptimizer can be risky.
Thanks,
-Hanumath Maduri
On Mon, Apr 20, 2020 at 11:04 AM Xiening Dai wrote:
> Hi Jinpeng,
>
> Regarding this comment - I believe there are ways to calculate the logical
> cost, but I think it’s not that simple as "cardinality * unit_copy_cost.”,
> would you provide more details of other different ways? Just the algorithm
> description or pseudo code would help us understand. Thanks.
>
> Regarding the approach of creating new planner, I don’t think a new
> planner would lower the risk. We don’t know what we don’t know. If we
> introduced an issue while modifying the planner, most likely we would do
> the same with new planner class. A new planner doesn’t necessarily prevent
> the issue from happening, but just delay its surfacing, which is worse IMO.
>
> There’s one obvious benefit with new planner is that we can provide some
> sort of isolation so the change won’t cause test baseline updates, which
> could be painful at times. We should see if we could use planner config to
> achieve the same if we decide to just modify the current planner.
>
>
> > On Apr 20, 2020, at 8:32 AM, Haisheng Yuan wrote:
> >
> > Igor,
> >
> > That's great.
> >
> > On 2020/04/20 11:17:49, Seliverstov Igor wrote:
> >> Haisheng, Xiening,
> >>
> >> Ok, Now I see how it should work.
> >>
> >> Thanks for your replies.
> >>
> >> Regards,
> >> Igor
> >>
> >>> 20 апр. 2020 г., в 09:56, Seliverstov Igor
> написал(а):
> >>>
> >>> Haisheng, Xiening,
> >>>
> >>> Thanks for clarifying.
> >>>
> >>> In this proposal, we are not trying to split logical and physical
> planning entirely. - actually I was in doubt about an idea of entire
> splitting logical and physical phases, if you aren't going to, I have no
> objections.
> >>>
> >>> But it returns me to my first question: how we will propagate traits
> in bottom-up manner using proposed approach. (See [DISCUSS] Proposal to add
> API to force rules matching specific rels for details)
> >>>
> >>> One of inconveniences of current VolcanoPlanner implementation is
> amount of tricks that we need to get desired behaviour. It would be great
> if some of issues (or all of them) were solved in the new approach.
> >>>
> >>> Regards,
> >>> Igor
> >>>
> >>> пн, 20 апр. 2020 г., 7:02 Xiening Dai xndai@gmail.com>>:
> >>> Hi Igor,
> >>>
> >>> Your comment - "because actual cost may be calculated correctly using
> physical operators only. So won't be able to implement Branch and Bound
> Space Pruning.“ is actually not true. In Cascade’s lower bound / upper
> bound pruning algorithm, you can get cost lower bound of input RelNode
> using cardinality * unit_copy_cost. The unit_copy_cost is a constant, which
> stands for the minimal cost for processing a tuple from the input. So this
> will be the minimal cost the input RelNode can achieve, and if this is
> indeed larger than the current best cost, this input node can be pruned.
> >>>
> >>> In this proposal, we are not trying to split logical and physical
> planning entirely. But for any given equivalent set, we would need to
> finish exploration before implementation. But for the entire memo tree,
> each set could be in different planning stage, and an equivalent set can be
> pruned even before it’s implemented. A text book example of aforementioned
> “lower bound / upper bound pruning” would be the join ordering case.
> >>>
> >>> Regarding #3, I think we can still achieve that (partially) through
> this proposal. Remember every time when we start the optimization, we