How to optimize repeated RelNode Structures? (CALCITE-3806)

Anjali Shrishrimal Wed, 19 Feb 2020 23:20:42 -0800

Hi everybody,

I would like to have your suggestions on CALCITE-3806.


Asking it here as suggested by Julian.





If RelNode tree contains a subtree whose result can be obtained by some other 
part of the same tree,

can we optimize it ? and how to express it in plan ?



For example,

Let's say input structure looks like this :



LogicalUnion(all=[true])

  LogicalProject(EMPNO=[$0])

    LogicalFilter(condition=[>=($0, 7369)])

      LogicalTableScan(table=[[scott, EMP]])

  LogicalProject(EMPNO=[$0])

    LogicalFilter(condition=[>=($0, 7369)])

      LogicalTableScan(table=[[scott, EMP]])





In this case,



  LogicalProject(EMPNO=[$0])

    LogicalFilter(condition=[>=($0, 7369)])

      LogicalTableScan(table=[[scott, EMP]])



is repeated. It is going to fetch same data twice.

Can we save one fetch? Can we somehow tell 2nd input of union to make use of 
union's 1st input. Is there any way to express that in plan?



Also,
If the structure was like this :



LogicalUnion(all=[true])

  LogicalProject(EMPNO=[$0])

    LogicalFilter(condition=[>=($0, 7369)])

      LogicalTableScan(table=[[scott, EMP]])

  LogicalProject(EMPNO=[$0])

    LogicalFilter(condition=[>=($0, 8000)])

      LogicalTableScan(table=[[scott, EMP]])



Second part of union can perform filtering on fetched data of 1st part. (As 
second's output is subset of first's output)



Does calcite provide such kind of optimizations ?

If not, what are the challenges to do so?







Would love to hear your thoughts.




Thank you,
Anjali Shrishrimal

How to optimize repeated RelNode Structures? (CALCITE-3806)

Reply via email to