Thanks Stamatis. Sorry there have been a few threads going on in parallel.
I’ve a PR currently under review. It’s based on suggestions from Julian and Haisheng. You can take a look if you are interested. Initial test result shows very promising improvement. https://github.com/apache/calcite/pull/1884#pullrequestreview-402251279 <https://github.com/apache/calcite/pull/1884#pullrequestreview-402251279> > On Apr 28, 2020, at 3:34 PM, Stamatis Zampetakis <zabe...@gmail.com> wrote: > > Hello, > > I've seen that parts from this discussion have moved to another thread [1]. > I decided to reply in this thread in an attempt to not deviate a lot the > discussion in [1], which I feel is more general than the problems mentioned > here. > > Getting things from the beginning, Xiening mentioned three drawbacks but I > would say that we are mostly bothered by number 1. > > For the second drawback, the fact that we use rules to transform a > LogicalFilter to an EnumerableFilter is not a big issue from my point of > view. If the plan is to register a special builder in the planner to > replace these rules then it seems that we move the code from one place to > another. Plus, if we cannot replace all the rules of Logical to Enumerable > and we split the logic into two places then the code will be more difficult > to understand. > > For the third drawback, I would say that it is mostly a problem of the > adapter. Every rule accepts a RelBuilderFactory so if the adapter wants to > create physical nodes it should pass an appropriate factory to the > respective rules. Recently in [1], and other times in the past, it was > brought up as a problem that ProjectMergeRule (and other similar kind of > rules) match Enumerable nodes and create Logical ones. I would say that > this is mostly a bad configuration rather than a problem that should be > resolved automatically by the planner. Avoiding the aforementioned behavior > does not look very complicated; we need to instantiate the rule so that it > matches EnumerableProject and uses a RelBuilderFactory that creates > Enumerable operators. > > From my last comment, I think it is evident that I am also leaning to the > direction outlined by Julian, a RelBuilder capable of creating many > physical operators. > > It's been a while that I didn't check the progress in [2] so apologies if > what I wrote here is obsolete. > > Best, > Stamatis > > [1] > https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E > [2] https://github.com/apache/calcite/pull/1884 > > On Thu, Apr 9, 2020, 12:57 AM Julian Hyde <jh...@apache.org> wrote: > >> It's challenging to support all physical operators, but we can and >> should support many of them in RelBuilder. >> >> As Haisheng points out, some physical operators have extra operands. >> Maybe some of those operands can be added to the factory interfaces, >> or shoe-horned in in some clever way. (Note how we represent ASC and >> DESC keywords in RelBuilder.sort(RexNode...) as if they were function >> calls, or made RelBuilder.AggCall a fluent API to accommodate the many >> possible operands of an aggregate function call.) >> >> Many physical operators do not have extra operands. They implement the >> same contract as the logical operator, but with different physical >> properties (traits) (e.g. in a different convention, or assuming the >> input is sorted or partitioned in a particular way). The "operands" to >> these operands are the physical properties that they consume and >> produce. >> >> But conversely, there are huge benefits to sharing the code involved >> in creating RelNodes. A HashAggregate physical operator has a lot in >> common with LogicalAggregate, so can benefit from the same code. >> Without RelBuilder (or its embedded helper objects like RexSimplifier) >> being the place to put that logic, it will end up being spread out >> over, and duplicated in, many RelOptRule instances. >> >> We should be ambitious, and aim to make RelBuilder useful for creating >> most physical nodes. Perhaps make modest extensions to RelBuilder and >> RelFactory APIs to achieve this. If we fail, people can still manually >> create RelNodes using RelBuilder.push(). >> >> Julian >> >> >> On Wed, Apr 8, 2020 at 10:07 AM Xiening Dai <xndai....@gmail.com> wrote: >>> >>> Yes, this is a good question. I think it would be up to the builder >> factory to decide. One can just create a logical join if it couldn’t decide >> which join algorithm to use. Then the withConvention() syntax does not >> provide guarantees. The caller will need an assert if they expect the >> RelNode returned to have the requested convention. >>> >>> I think eventually it comes down to how we define the role of >> RelBuilder. Now it does look like it's doing some simple logical >> transformation (see filter() and sort()). >>> >>>> On Apr 7, 2020, at 5:43 PM, Haisheng Yuan <h.y...@alibaba-inc.com> >> wrote: >>>> >>>> Thanks Xiening for moving this to dev list. >>>> >>>> Unlike logical operators, with which many systems share the same >> structure, physical operators are pretty implementation dependent. >> RelBuilder provides the common interface to create aggregate, join. But >> what kind of physical aggregate? HashAgg, StreamAgg? How do I specify it as >> a local aggregate, or global aggregate? In many cases, these physical >> operator constructors have system dependent arguments, will the RelBuilder >> support these? Shuffle / Exchange operator of different system may also >> differ on their arguments. >>>> >>>> The worst case is that only physical filter, project, sort can be >> created using the physical RelBuilder. >>>> >>>> - Haisheng >>>> >>>> ------------------------------------------------------------------ >>>> 发件人:Xiening Dai<xndai....@gmail.com> >>>> 日 期:2020年04月08日 07:36:43 >>>> 收件人:<dev@calcite.apache.org> >>>> 主 题:[DICUSS] Support building physical RelNode in Calcite >>>> >>>> Hi all, >>>> >>>> In light of CALCITE-2970, I’d like to initiate a discussion. >>>> >>>> Currently the framework itself does not have a way to create physical >> RelNode (RelNode with a particular convention). We completely rely on >> adapter rules to convert logical nodes into physical ones. There are a few >> major drawbacks with this approach - >>>> >>>> 1. During trait enforcement, we have to create logic node and then get >> it implemented by converter rule, even though we know the target >> convention. This results in the increase of memo size and planning time. >> CALCITE-2970 is one good example. >>>> >>>> 2. A lot of implementation rules today simply copy the inputs and >> create a physical node (examples like EnumerableProjectRule, >> EnumerableFilterRule, etc). If the framework was able to create physical >> node, a lot of these trivial rules could be eliminated. >>>> >>>> 3. There are a number of adapter rules today using RelBuilder and >> creating logical node. This is not desirable since in a lot of cases what >> they need are physical nodes instead. Creating logical nodes then getting >> them implemented by converters again is inefficient. >>>> >>>> To solve this problem, there have been two proposals so far - >>>> >>>> a) Extend current RelBuilder interface to support withConvention() >> syntax, which means you could build a physical node with the convention >> specified. This can be done through extending the RelNode factory withheld >> by the builder. Calcite user can register a particular factory for their >> own convention. >>>> >>>> b) Extend Convention interface and add something like "RelNode >> enforce(RelNode input, RelTrait trait)". This would be used to address >> issue #1. A convention could implement this method to create exact physical >> node to be used for satisfying given trait. >>>> >>>> I personally prefer a) since I believe this is a basic building block >> and it’s not just for enforcement. Also extending RelBuild feels nature to >> me - the intention was clearly there when this class was created (look at >> the class comments). There have been discussions about whether or not >> RelBuilder should support creating physical nodes. Although we have been >> using it solely for logical nodes building today, but that doesn’t mean we >> shouldn’t extend to physical ones. I haven't seen clear reasoning. >>>> >>>> Would like to hear your thoughts. >>> >>