Re: Volcano's problem with trait propagation: current state and future

Roman Kondakov Thu, 05 Dec 2019 04:35:06 -0800

Vladimir,

thank you for bringing it up. We are facing the same problems in ApacheIgnite projectand it would be great if Apache Calcite community will propose asolution for this

issue.

From my point of view an approach with abstract converters looks morepromising, but asyou mentioned it suffers from polluting the search space. The latter canbe mitigated bysplitting a planning stage into the several phases: you shouldn'tregister all logical rules in the planner simultaneously - it looks likeit is better to have several iterations of planning stage with differentsets of rules, as Drill does.

Also I'd like to mention that decoupling the logical planning from thephysical one looksa bit weird to me because it violates the idea of Cascades framework.Possibly this decoupling is the consequence of some performance issues.



--
Kind Regards
Roman Kondakov

On 05.12.2019 14:24, Vladimir Ozerov wrote:

Hi,

As I mentioned before, we are building a distributed SQL engine that uses
Apache Calcite for query optimization. The key problem we faced is the
inability to pull the physical traits of child relations efficiently. I'd
like to outline my understanding of the problem (I guess it was already
discussed multiple times) and ask the community to prove or disprove the
existence of that problem and its severity for the products which uses
Apache Calcite and ask for ideas on how it could be improved in the future.

I'll start with the simplified problem description and mentioned more
complex use cases then. Consider that we have a logical tree and a set of
implementation rules. Our goal is to find the optimal physical tree by
applying these rules. The classical Cascades-based approach directs the
optimization process from the top to the bottom (hence "top-down").
However, the actual implementation of tree nodes still happens bottom-up.
For the tree L1 <- L2, we enter "optimize(L1)", which recursively delegates
to "optimize(L2)". We then implement children nodes L1 <- [P2', P2''], and
return back to the parent, which is now able to pick promising
implementations of the children nodes and reject bad ones with the
branch-and-bound approach. AFAIK Pivotal's Orca works this way.

The Apache Calcite is very different because it doesn't allow the recursion
so that we lose the context on which node requested the child
transformation. This loss of context leads to the following problems:
1) The parent node cannot deduce it's physical properties during the
execution of the implementation rule, because Calcite expects the
transformation to be applied before children nodes are implemented. That is
if we are optimizing LogicalProject <- LogicalScan, we cannot set proper
distribution and collation for the to be created PhysicalProject, because
it depends on the distribution and collation of the children which is yet
to be resolved.
2) The branch-and-bound cannot be used because it requires at least one
fully-built physical subtree.

As a result of this limitation, products which rely on Apache Calcite for
query optimization, use one or several workarounds:

*1) Guess the physical properties of parent nodes before logical children
are implemented*
*Apache Flink* uses this strategy. The strategy is bad because of the
number of combinations of traits growth exponentially with the number of
attributes in the given RelNode, so you either explode the search space or
give up optimization opportunities. Consider the following tree:
LogicalSort[a ASC] <- LogicalFilter <- LogicalScan
The optimal implementation of the LogicalFilter is PhysicalFilter[collation=a
ASC] because it may eliminate the parent sort. But such optimization should
happen only if we know that there is a physical implementation of scan
allowing for this sort order, e.g. PhysicalIndexScan[collation=a ASC]. I.e.
we need to know the child physical properties first. Otherwise we fallback
to speculative approaches. With the *optimistic* approach, we emit all
possible combinations of physical properties, with the hope that the child
will satisfy some of them, thus expanding the search space exponentially.
With the *pessimistic* approach, we just miss this optimization opportunity
even if the index exists. Apache Flink uses the pessimistic approach.

*2) Use AbstractConverters*
*Apache Drill* uses this strategy. The idea is to "glue" logical and
physical operators, so that implementation of a physical child re-triggers
implementation rule of a logical parent. The flow is as follows:
- Invoke parent implementation rule - it either doesn't produce new
physical nodes or produce not optimized physical nodes (like in the Apache
Flink case)
- Invoke children implementation rules and create physical children
- Then converters kick-in and re-trigger parent implementation rule through
the creation of an abstract converter
- Finally, the parent implementation rule is fired again and now it
produces optimized node(s) since at least some of the physical
distributions of children nodes are implemented.

Note that this is essentially a hack to simulate the Cascades flow! The
problem is that AbstractConverters increase the complexity of planning
because they do not have any context, so parent rules are just re-triggered
blindly. Consider the optimization of the following tree:
LogicalJoin <- [LogicalScan1, LogicalScan2]
With the converter approach, the join implementation rule will be fired at
least 3 times, while in reality, one call should be sufficient. In our
experiments with TPC-H queries, the join rule implemented that way is
typically called 6-9 times more often than expected.

*3) Transformations (i.e. logical optimization) are decoupled from
implementation (i.e. physical optimization)*
Normally, you would like to mix both logical and physical rules in a single
optimization program, because it is required for proper planning. That is,
you should consider both (Ax(BxC)) and ((AxB)xC) join ordering during
physical optimization, because you do not know which one will produce the
better plan in advance.
But in some practical implementations of Calcite-based optimizers, this is
not the case, and join planning is performed as a separate HEP stage.
Examples are Apache Drill and Apache Flink.
I believe that lack of Cascades-style flow and branch-and-bound are among
the main reasons for this. At the very least for Apache Drill, since it
uses converters, so additional logical permutations will exponentially
multiply the number of fired rules, which is already very big.

Given all these problems I'd like to ask the community to share current
thoughts and ideas on the future improvement of the Calcite optimizer. One
of the ideas being discussed in the community is "Pull-up Traits", which
should allow the parent node to get physical properties of the children
nodes. But in order to do this, you effectively need to implement children,
which IMO makes this process indistinguishable from the classical recursive
Cascades algorithm.

Have you considered recursive transformations as an alternative solution to
that problem? I.e. instead of trying to guess or pull the physical
properties of non-existent physical nodes, go ahead and actually implement
them directly from within the parent rule? This may resolve the problem
with trait pull-up, as well as allow for branch-and-bound optimization.

I would appreciate your feedback or any hints for future research.

Regards,
Vladimir.

Re: Volcano's problem with trait propagation: current state and future

Reply via email to