Re: How to optimize repeated RelNode Structures? (CALCITE-3806)

Stamatis Zampetakis Fri, 21 Feb 2020 00:31:23 -0800

Hi Anjali,

For the first plan it seems that the Union is redundant since both inputs
are identical and the operator removes duplicates. I guess the plan could
be simplified to:


LogicalProject(EMPNO=[$0])
  LogicalFilter(condition=[>=($0, 7369)])
    LogicalTableScan(table=[[scott, EMP]])

I don't remember seeing a rule that does this so possibly we could add a
new one in PruneEmptyRules [1].

Apart from that and as others have mentioned there is a way to model the
materialisation of the common sub-expression using spools. The second plan
that you provided could look like below:

LogicalRepeatUnion(all=[true])
  LogicalTableSpool(readType=[LAZY], writeType=[EAGER], table=[[D]])
    LogicalProject(EMPNO=[$0])
      LogicalFilter(condition=[>=($0, 7369)])
        LogicalTableScan(table=[[scott, EMP]])
  LogicalFilter(condition=[>=($0, 8000)])
    LogicalTableScan(table=[[D]]) // This is a transient table

Currently there are no rules in Calcite that are going to perform such
transformation for using the spools.
Nevertheless there are rules/code who perform view based rewriting (e.g.,
in [2]) which might give you an idea of how can this be done.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/PruneEmptyRules.java
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/AbstractMaterializedViewRule.java

On Thu, Feb 20, 2020 at 3:54 PM Christian Beikov <christian.bei...@gmail.com>
wrote:

> I guess what you are looking for is something like the SubPlan node of
> PostgreSQL? Such a node would materialize the results to be used by
> multiple other nodes or in a nested loop context to avoid accessing a
> source relation multiple times.
>
> Would be cool if Calcite could create a SubPlan and let the union all node
> use that instead. Something like this
>
> Materialize(name=Sub1)
>
>   LogicalProject(EMPNO=[$0])
>
>      LogicalFilter(condition=[OR(>=($0, 7369), >=($0, 7369))])
>
>        LogicalTableScan(table=[[scott, EMP]])
>
> LogicalUnion(all=[true])
>
>    LogicalProject(EMPNO=[$0])
>
>      LogicalFilter(condition=[>=($0, 7369)])
>
>        LogicalTableScan(table=[[Sub1]])
>
>    LogicalProject(EMPNO=[$0])
>
>      LogicalFilter(condition=[>=($0, 7369)])
>
>        LogicalTableScan(table=[[Sub1]])
>
> Of course there are further optimizations that could be done like removing
> filters etc. but that's just a general example.
>
> Not sure if the Spool operator does that, but such a translation could also
> come in handy for distributed queries where every row access to a remote
> system has protocol overhead.
>
> Danny Chan <yuzhao....@gmail.com> schrieb am Do., 20. Feb. 2020, 15:27:
>
> > Calcite has a Spool operator, maybe you can check that.
> >
> > Anjali Shrishrimal <anjali.shrishri...@1eq.com.invalid>于2020年2月20日
> > 周四下午3:20写道：
> >
> > > Hi everybody,
> > >
> > > I would like to have your suggestions on CALCITE-3806.
> > >
> > > Asking it here as suggested by Julian.
> > >
> > >
> > >
> > >
> > >
> > > If RelNode tree contains a subtree whose result can be obtained by some
> > > other part of the same tree,
> > >
> > > can we optimize it ? and how to express it in plan ?
> > >
> > >
> > >
> > > For example,
> > >
> > > Let's say input structure looks like this :
> > >
> > >
> > >
> > > LogicalUnion(all=[true])
> > >
> > >   LogicalProject(EMPNO=[$0])
> > >
> > >     LogicalFilter(condition=[>=($0, 7369)])
> > >
> > >       LogicalTableScan(table=[[scott, EMP]])
> > >
> > >   LogicalProject(EMPNO=[$0])
> > >
> > >     LogicalFilter(condition=[>=($0, 7369)])
> > >
> > >       LogicalTableScan(table=[[scott, EMP]])
> > >
> > >
> > >
> > >
> > >
> > > In this case,
> > >
> > >
> > >
> > >   LogicalProject(EMPNO=[$0])
> > >
> > >     LogicalFilter(condition=[>=($0, 7369)])
> > >
> > >       LogicalTableScan(table=[[scott, EMP]])
> > >
> > >
> > >
> > > is repeated. It is going to fetch same data twice.
> > >
> > > Can we save one fetch? Can we somehow tell 2nd input of union to make
> use
> > > of union's 1st input. Is there any way to express that in plan?
> > >
> > >
> > >
> > > Also,
> > > If the structure was like this :
> > >
> > >
> > >
> > > LogicalUnion(all=[true])
> > >
> > >   LogicalProject(EMPNO=[$0])
> > >
> > >     LogicalFilter(condition=[>=($0, 7369)])
> > >
> > >       LogicalTableScan(table=[[scott, EMP]])
> > >
> > >   LogicalProject(EMPNO=[$0])
> > >
> > >     LogicalFilter(condition=[>=($0, 8000)])
> > >
> > >       LogicalTableScan(table=[[scott, EMP]])
> > >
> > >
> > >
> > > Second part of union can perform filtering on fetched data of 1st part.
> > > (As second's output is subset of first's output)
> > >
> > >
> > >
> > > Does calcite provide such kind of optimizations ?
> > >
> > > If not, what are the challenges to do so?
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Would love to hear your thoughts.
> > >
> > >
> > >
> > >
> > > Thank you,
> > > Anjali Shrishrimal
> > >
> >
>

Re: How to optimize repeated RelNode Structures? (CALCITE-3806)

Reply via email to