Hi Vladimir,

You can get an idea of how the Volcano planner works by reading [1].
The implementation of the Volcano in Calcite has many differences but the
main ideas are there.

Normally you do not need to set canConvertConvention to true; especially in
your case it doesn't seem necessary.

I think the main problem with your approach is that the Project rule does
not produce any transformation when it is first invoked.
The Volcano planner works mainly in a top-down fashion so rules tend to
match from top to bottom.
For example, in [1] if for a particular operator there is no algorithm,
transformation rule, or enforcer to apply at a given step the planning
process would stop and you wouldn't even see the transformation of the scan.

When the project rule matches you should transform it to the
HZPhysicalProject and then require that the child operator has certain
traits (the distribution that is necessary).
The rule should look like the EnumerableProjectRule [2] but instead of
requiring only the convention you should pass also the requirement for the
distribution of the scan.

Best,
Stamatis

[1]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableProjectRule.java



On Mon, Oct 28, 2019 at 10:24 PM Vladimir Ozerov <ppoze...@gmail.com> wrote:

> Hi colleagues,
>
> We are building a Calcite-based optimizer for Hazelcast, and I have some
> problems understanding Calcite's logic with respect to converters. Let me
> briefly explain the problem.
>
> We have an execution backend, so we do not need Bindable or Enumerable.
> Instead, we would like to use Calcite to convert original SQL to a tree
> with our own convention, then convert it to our internal representation,
> and finally, execute.
>
> We started with looking at other Calcite integrations and eventually came
> to a classical two-phase optimization approach. We have two internal
> conventions - LOGICAL and PHYSICAL. The goal is to optimize the tree as
> follows:
> 1) NONE -> LOGICAL - heuristical optimizations
> 2) LOGICAL -> PHYSICAL - cost-based planning
>
> Suppose that after the first phase I have the following tree of our own
> operators:
> HZLogicalRoot
> -> HZLogicalProject
>   -> HZLogicalScan
>
> For this specific case, there is not much to optimize, so we only need to
> transition to physical nodes and do some boilerplate with traits
> propagation:
> HZPhysicalRoot
> -> HZPhysicalProject
>   -> HZPhysicalScan
>
> In order to achieve this, I define three rules, which just do a conversion
> of relevant nodes. Volcano optimizer is used.
>
> Now, the problem - somehow it works only when I override
> Convention.Impl.canConvertConvention to true for our PHYSICAL convention,
> but that blows the search space and the same rules are called many times. A
> lot of time is spent on endless PHYSICAL -> LOGICAL conversions, which are
> of no use.
>
> If I change canConvertConvention to false, then rules are called a sensible
> number of times, but cannot produce a complete PHYSICAL tree. Here is how
> it works:
> 1) "Root" rule is invoked, which converts "HZLogicalRoot" to
> "HZPhysicalRoot"
> 2) "Project" rule is invoked, but do not produce any transformations, since
> it needs Scan distribution, which is not known yet. This desired behavior
> at this point.
> 3) "Scan" rule is invoked, "HZLogicalScan" is converted to
> "HZPhysicalScan". Distribution is resolved
> 4) At this point, we have [LogicalRoot, PhysicalRoot] -> [LogicalProject]
> -> [LogicalScan, PhysicalScan] sets . I expect that since new scan was
> installed, the "Project" rule will be fired again. This time we know the
> distribution, so the transformation is possible. But the rule is not called
> and we fail with an error.
>
> So my questions are:
> 1) What is the real role of converters in this process? For some reason,
> when unnecessary (from a logical standpoint) PHYSICAL -> LOGICAL conversion
> is allowed, even complex plans could be built. And Drill does it for some
> reason. But it costs multiple additional invocations of the same rules. Are
> there any docs or presentations explaining the mechanics behind?
> 2) What are the minimum requirements, that will allow a rule on the parent
> to be fired again after it's child node has changed?
>
> I can provide any additional information, source code or even working
> example of this problem if needed. I don't want to bother you with it at
> the moment, because it feels like I miss something very simple.
>
> Would appreciate your help.
>
> Regards,
> Vladimir.
>

Reply via email to