It would certainly be possible to write a planner rule that converts a Calc to a sequence of Project and an optional Filter.
However, it might not yield optimal performance. Consider this query: SELECT a AS x1, a + b AS x2, a + b + c AS x3, a + b + c + d AS x4, a + b + c + d + e AS x5 FROM t The depth of the DAG is 4, and therefore it would generate 4 Project operators: the first would compute “a + b” and project it as x2, the second would compute “a + b + c” using the intermediate result in x2 and project it as x3, and so forth. In any practical DBMS the cost of returning a row from a Project operator will be much higher than the cost of a + operation, so this plan is not optimal. To generate a plan with the fastest runtime performance, you will need to optimize with a utility function that knows the cost of each operator and also knows the cost of emitting a row from a Project. Or do as we did in Calcite, which is to have a physical implementation of Project that shares intermediate expressions. Julian > On Jul 16, 2022, at 6:24 PM, Benchao Li <libenc...@apache.org> wrote: > > Hi Jiajun, > > I'm thinking about using the optimization of Calc, but change CalcToSql. > > Do you mean you want to translate the above query to this one? > ```SQL > SELECT "salary", EXPR$1, EXPR$0, EXPR$2, EXPR$2 > FROM ( > SELECT "salary", EXPR$0, EXPR$1, EXPR$0 + EXPR$1 AS EXPR$2 > FROM ( > SELECT "salary", "salary" * 3 AS EXPR$0, "salary" * 12 AS EXPR$1 > FROM "hr"."emps" > ) > ) > ``` > If this is your intended way, I think you may need to figure out whether > Trino > will merge these projections in the first place. > Even Trino could leverage the nested projections, you may still meet some > complex expression trees which are very deep, I don't think this is a good > solution. > Maybe the best way is to make Trino support this natively. > > > Jiajun Xie <jiajunbernou...@gmail.com> 于2022年7月16日周六 16:51写道: > >> Yes, you are both correct. >> >> I got an expected plan by using ProjectToCalcRule >> ``` >> PLAN=EnumerableCalc(expr#0..4=[{inputs}], expr#5=[12], expr#6=[*($t5, >> $t3)], expr#7=[3], expr#8=[*($t7, $t3)], expr#9=[*($t3, $t5)], >> expr#10=[*($t3, $t7)], expr#11=[+($t9, $t10)], salary=[$t3], EXPR$1=[$t6], >> EXPR$2=[$t8], EXPR$3=[$t11], EXPR$4=[$t11])\n >> EnumerableTableScan(table=[[hr, emps]])\n\n >> ``` >> >> But I got an unexpected query when using RelToSql. I hope to extract >> CommonSubExpressions in SQL, because Trino does not support >> CommonSubExpressions optimization. (Presto supports it since 0.245). >> ``` >> SELECT "salary", 12 * "salary", 3 * "salary", "salary" * 12 + "salary" * 3, >> "salary" * 12 + "salary" * 3 >> FROM "hr"."emps" >> ``` >> >> I'm thinking about using the optimization of Calc, but change CalcToSql. >> What do you think? >> FYI: In my project, I convert HiveSQL to PrestoSQL, and use some >> optimizations of Calcite. >> >> Thanks for your reply! >> >> On Fri, 15 Jul 2022 at 09:44, Benchao Li <libenc...@apache.org> wrote: >> >>> Agree with Julian, RexProgram could do this work. >>> >>> I did this before in physical codegen based on Calc's RexProgram, and it >>> works very well. >>> >>> Julian Hyde <jhyde.apa...@gmail.com> 于2022年7月14日周四 23:21写道: >>> >>>> It seems that you are tackling common scalar expressions (RexNodes) >>>> whereas, based on the name the other rule is dealing with common >>> relational >>>> expressions (RelNode). >>>> >>>> For your rule, consider using RexProgram, RexProgramBuilder, and Calc. >>>> They already convert a list of expressions to a DAG, so that no >>> expression >>>> is calculated more than once. >>>> >>>> Julian >>>> >>>>> On Jul 14, 2022, at 06:31, Jiajun Xie <jiajunbernou...@gmail.com> >>> wrote: >>>>> >>>>> Hello, all~ >>>>> I am writing an RelOptRule that is for CommonSubExpressions. Here is >>> an >>>>> example that I completed. >>>>> sql: `select sal, sal * 12, sal * 3, sal * 12 + sal * 3 from emp` >>>>> planBefore: ```LogicalProject(SAL=[$5], EXPR$1=[*($5, 12)], >>>>> EXPR$2=[*($5, 3)], EXPR$3=[+(*($5, 12), *($5, 3))]) >>>>> LogicalTableScan(table=[[CATALOG, SALES, EMP]])``` >>>>> planAfter: ```LogicalProject(SAL=[$0], EXPR$1=[$1], EXPR$2=[$2], >>>>> EXPR$3=[+($1, $2)]) >>>>> LogicalProject(SAL=[$5], CSE$0=[*($5, 12)], CSE$1=[*($5, 3)]) >>>>> LogicalTableScan(table=[[CATALOG, SALES, EMP]])``` >>>>> I found a RelOptRule that is named as CommonRelSubExprRule, but it >> is >>>>> abstract. I wonder if anyone has implemented it? Is it the same as >>> what I >>>>> am doing? >>>>> Thanks very much~ >>>> >>> >>> >>> -- >>> >>> Best, >>> Benchao Li >>> >> > > > -- > > Best, > Benchao Li