Thanks Stamatis. Sorry there have been a few threads going on in parallel. 

I’ve a PR currently under review. It’s based on suggestions from Julian and 
Haisheng. You can take a look if you are interested. Initial test result shows 
very promising improvement.

https://github.com/apache/calcite/pull/1884#pullrequestreview-402251279 
<https://github.com/apache/calcite/pull/1884#pullrequestreview-402251279>


> On Apr 28, 2020, at 3:34 PM, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> 
> Hello,
> 
> I've seen that parts from this discussion have moved to another thread [1].
> I decided to reply in this thread in an attempt to not deviate a lot the
> discussion in [1], which I feel is more general than the problems mentioned
> here.
> 
> Getting things from the beginning, Xiening mentioned three drawbacks but I
> would say that we are mostly bothered by number 1.
> 
> For the second drawback, the fact that we use rules to transform a
> LogicalFilter to an EnumerableFilter is not a big issue from my point of
> view. If the plan is to register a special builder in the planner to
> replace these rules then it seems that we move the code from one place to
> another. Plus, if we cannot replace all the rules of Logical to Enumerable
> and we split the logic into two places then the code will be more difficult
> to understand.
> 
> For the third drawback, I would say that it is mostly a problem of the
> adapter. Every rule accepts a RelBuilderFactory so if the adapter wants to
> create physical nodes it should pass an appropriate factory to the
> respective rules. Recently in [1], and other times in the past, it was
> brought up as a problem that ProjectMergeRule (and other similar kind of
> rules) match Enumerable nodes and create Logical ones. I would say that
> this is mostly a bad configuration rather than a problem that should be
> resolved automatically by the planner. Avoiding the aforementioned behavior
> does not look very complicated; we need to instantiate the rule so that it
> matches EnumerableProject and uses a RelBuilderFactory that creates
> Enumerable operators.
> 
> From my last comment, I think it is evident that I am also leaning to the
> direction outlined by Julian, a RelBuilder capable of creating many
> physical operators.
> 
> It's been a while that I didn't check the progress in [2] so apologies if
> what I wrote here is obsolete.
> 
> Best,
> Stamatis
> 
> [1]
> https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E
> [2] https://github.com/apache/calcite/pull/1884
> 
> On Thu, Apr 9, 2020, 12:57 AM Julian Hyde <jh...@apache.org> wrote:
> 
>> It's challenging to support all physical operators, but we can and
>> should support many of them in RelBuilder.
>> 
>> As Haisheng points out, some physical operators have extra operands.
>> Maybe some of those operands can be added to the factory interfaces,
>> or shoe-horned in in some clever way. (Note how we represent ASC and
>> DESC keywords in RelBuilder.sort(RexNode...) as if they were function
>> calls, or made RelBuilder.AggCall a fluent API to accommodate the many
>> possible operands of an aggregate function call.)
>> 
>> Many physical operators do not have extra operands. They implement the
>> same contract as the logical operator, but with different physical
>> properties (traits) (e.g. in a different convention, or assuming the
>> input is sorted or partitioned in a particular way). The "operands" to
>> these operands are the physical properties that they consume and
>> produce.
>> 
>> But conversely, there are huge benefits to sharing the code involved
>> in creating RelNodes. A HashAggregate physical operator has a lot in
>> common with LogicalAggregate, so can benefit from the same code.
>> Without RelBuilder (or its embedded helper objects like RexSimplifier)
>> being the place to put that logic, it will end up being spread out
>> over, and duplicated in, many RelOptRule instances.
>> 
>> We should be ambitious, and aim to make RelBuilder useful for creating
>> most physical nodes. Perhaps make modest extensions to RelBuilder and
>> RelFactory APIs to achieve this. If we fail, people can still manually
>> create RelNodes using RelBuilder.push().
>> 
>> Julian
>> 
>> 
>> On Wed, Apr 8, 2020 at 10:07 AM Xiening Dai <xndai....@gmail.com> wrote:
>>> 
>>> Yes, this is a good question. I think it would be up to the builder
>> factory to decide. One can just create a logical join if it couldn’t decide
>> which join algorithm to use. Then the withConvention() syntax does not
>> provide guarantees. The caller will need an assert if they expect the
>> RelNode returned to have the requested convention.
>>> 
>>> I think eventually it comes down to how we define the role of
>> RelBuilder. Now it does look like it's doing some simple logical
>> transformation (see filter() and sort()).
>>> 
>>>> On Apr 7, 2020, at 5:43 PM, Haisheng Yuan <h.y...@alibaba-inc.com>
>> wrote:
>>>> 
>>>> Thanks Xiening for moving this to dev list.
>>>> 
>>>> Unlike logical operators, with which many systems share the same
>> structure, physical operators are pretty implementation dependent.
>> RelBuilder provides the common interface to create aggregate, join. But
>> what kind of physical aggregate? HashAgg, StreamAgg? How do I specify it as
>> a local aggregate, or global aggregate? In many cases, these physical
>> operator constructors have system dependent arguments, will the RelBuilder
>> support these? Shuffle / Exchange operator of different system may also
>> differ on their arguments.
>>>> 
>>>> The worst case is that only physical filter, project, sort can be
>> created using the physical RelBuilder.
>>>> 
>>>> - Haisheng
>>>> 
>>>> ------------------------------------------------------------------
>>>> 发件人:Xiening Dai<xndai....@gmail.com>
>>>> 日 期:2020年04月08日 07:36:43
>>>> 收件人:<dev@calcite.apache.org>
>>>> 主 题:[DICUSS] Support building physical RelNode in Calcite
>>>> 
>>>> Hi all,
>>>> 
>>>> In light of CALCITE-2970, I’d like to initiate a discussion.
>>>> 
>>>> Currently the framework itself does not have a way to create physical
>> RelNode (RelNode with a particular convention). We completely rely on
>> adapter rules to convert logical nodes into physical ones. There are a few
>> major drawbacks with this approach -
>>>> 
>>>> 1. During trait enforcement, we have to create logic node and then get
>> it implemented by converter rule, even though we know the target
>> convention. This results in the increase of memo size and planning time.
>> CALCITE-2970 is one good example.
>>>> 
>>>> 2. A lot of implementation rules today simply copy the inputs and
>> create a physical node (examples like EnumerableProjectRule,
>> EnumerableFilterRule, etc). If the framework was able to create physical
>> node, a lot of these trivial rules could be eliminated.
>>>> 
>>>> 3. There are a number of adapter rules today using RelBuilder and
>> creating logical node. This is not desirable since in a lot of cases what
>> they need are physical nodes instead. Creating logical nodes then getting
>> them implemented by converters again is inefficient.
>>>> 
>>>> To solve this problem, there have been two proposals so far -
>>>> 
>>>> a) Extend current RelBuilder interface to support withConvention()
>> syntax, which means you could build a physical node with the convention
>> specified. This can be done through extending the RelNode factory withheld
>> by the builder. Calcite user can register a particular factory for their
>> own convention.
>>>> 
>>>> b) Extend Convention interface and add something like "RelNode
>> enforce(RelNode input, RelTrait trait)". This would be used to address
>> issue #1. A convention could implement this method to create exact physical
>> node to be used for satisfying given trait.
>>>> 
>>>> I personally prefer a) since I believe this is a basic building block
>> and it’s not just for enforcement. Also extending RelBuild feels nature to
>> me - the intention was clearly there when this class was created (look at
>> the class comments). There have been discussions about whether or not
>> RelBuilder should support creating physical nodes. Although we have been
>> using it solely for logical nodes building today, but that doesn’t mean we
>> shouldn’t extend to physical ones. I haven't seen clear reasoning.
>>>> 
>>>> Would like to hear your thoughts.
>>> 
>> 

Reply via email to