Re: [Discuss] CommonSubExpressions Optimize

Julian Hyde Fri, 22 Jul 2022 11:44:52 -0700

It would certainly be possible to write a planner rule that converts a Calc to 
a sequence of Project and an optional Filter.


However, it might not yield optimal performance. Consider this query:

  SELECT a AS x1,
    a + b AS x2,
    a + b + c AS x3,
    a + b + c + d AS x4,
    a + b + c + d + e AS x5
  FROM t

The depth of the DAG is 4, and therefore it would generate 4 Project operators: 
the first would compute “a + b” and project it as x2, the second would compute 
“a + b + c” using the intermediate result in x2 and project it as x3, and so 
forth.

In any practical DBMS the cost of returning a row from a Project operator will 
be much higher than the cost of a + operation, so this plan is not optimal.

To generate a plan with the fastest runtime performance, you will need to 
optimize with a utility function that knows the cost of each operator and also 
knows the cost of emitting a row from a Project. Or do as we did in Calcite, 
which is to have a physical implementation of Project that shares intermediate 
expressions.

Julian



> On Jul 16, 2022, at 6:24 PM, Benchao Li <libenc...@apache.org> wrote:
> 
> Hi Jiajun,
> 
> I'm thinking about using the optimization of Calc, but change CalcToSql.
> 
> Do you mean you want to translate the above query to this one?
> ```SQL
> SELECT "salary", EXPR$1, EXPR$0, EXPR$2, EXPR$2
> FROM (
>  SELECT "salary", EXPR$0, EXPR$1, EXPR$0 + EXPR$1 AS EXPR$2
>  FROM (
>    SELECT "salary", "salary" * 3 AS EXPR$0, "salary" * 12 AS EXPR$1
>    FROM "hr"."emps"
>  )
> )
> ```
> If this is your intended way, I think you may need to figure out whether
> Trino
> will merge these projections in the first place.
> Even Trino could leverage the nested projections, you may still meet some
> complex expression trees which are very deep, I don't think this is a good
> solution.
> Maybe the best way is to make Trino support this natively.
> 
> 
> Jiajun Xie <jiajunbernou...@gmail.com> 于2022年7月16日周六 16:51写道：
> 
>> Yes, you are both correct.
>> 
>> I got an expected plan by using ProjectToCalcRule
>> ```
>> PLAN=EnumerableCalc(expr#0..4=[{inputs}], expr#5=[12], expr#6=[*($t5,
>> $t3)], expr#7=[3], expr#8=[*($t7, $t3)], expr#9=[*($t3, $t5)],
>> expr#10=[*($t3, $t7)], expr#11=[+($t9, $t10)], salary=[$t3], EXPR$1=[$t6],
>> EXPR$2=[$t8], EXPR$3=[$t11], EXPR$4=[$t11])\n
>> EnumerableTableScan(table=[[hr, emps]])\n\n
>> ```
>> 
>> But I got an unexpected  query when using RelToSql. I hope to extract
>> CommonSubExpressions in SQL, because Trino does not support
>> CommonSubExpressions optimization. (Presto supports it since 0.245).
>> ```
>> SELECT "salary", 12 * "salary", 3 * "salary", "salary" * 12 + "salary" * 3,
>> "salary" * 12 + "salary" * 3
>> FROM "hr"."emps"
>> ```
>> 
>> I'm thinking about using the optimization of Calc, but change CalcToSql.
>> What do you think?
>> FYI: In my project, I convert HiveSQL to PrestoSQL, and use some
>> optimizations of Calcite.
>> 
>> Thanks for your reply!
>> 
>> On Fri, 15 Jul 2022 at 09:44, Benchao Li <libenc...@apache.org> wrote:
>> 
>>> Agree with Julian, RexProgram could do this work.
>>> 
>>> I did this before in physical codegen based on Calc's RexProgram, and it
>>> works very well.
>>> 
>>> Julian Hyde <jhyde.apa...@gmail.com> 于2022年7月14日周四 23:21写道：
>>> 
>>>> It seems that you are tackling common scalar expressions (RexNodes)
>>>> whereas, based on the name the other rule is dealing with common
>>> relational
>>>> expressions (RelNode).
>>>> 
>>>> For your rule, consider using RexProgram, RexProgramBuilder, and Calc.
>>>> They already convert a list of expressions to a DAG, so that no
>>> expression
>>>> is calculated more than once.
>>>> 
>>>> Julian
>>>> 
>>>>> On Jul 14, 2022, at 06:31, Jiajun Xie <jiajunbernou...@gmail.com>
>>> wrote:
>>>>> 
>>>>> Hello, all~
>>>>> I am writing an RelOptRule that is for CommonSubExpressions. Here is
>>> an
>>>>> example that I completed.
>>>>>   sql:  `select sal, sal * 12, sal * 3, sal * 12 + sal * 3 from emp`
>>>>>   planBefore: ```LogicalProject(SAL=[$5], EXPR$1=[*($5, 12)],
>>>>> EXPR$2=[*($5, 3)], EXPR$3=[+(*($5, 12), *($5, 3))])
>>>>> LogicalTableScan(table=[[CATALOG, SALES, EMP]])```
>>>>>   planAfter: ```LogicalProject(SAL=[$0], EXPR$1=[$1], EXPR$2=[$2],
>>>>> EXPR$3=[+($1, $2)])
>>>>> LogicalProject(SAL=[$5], CSE$0=[*($5, 12)], CSE$1=[*($5, 3)])
>>>>>  LogicalTableScan(table=[[CATALOG, SALES, EMP]])```
>>>>> I found a RelOptRule that is named as CommonRelSubExprRule, but it
>> is
>>>>> abstract. I wonder if anyone has implemented it? Is it the same as
>>> what I
>>>>> am doing?
>>>>> Thanks very much~
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> Best,
>>> Benchao Li
>>> 
>> 
> 
> 
> -- 
> 
> Best,
> Benchao Li

Re: [Discuss] CommonSubExpressions Optimize

Reply via email to