[ https://issues.apache.org/jira/browse/FLINK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804093#comment-17804093 ]
Benchao Li commented on FLINK-33996: ------------------------------------ Instead of solving this in optimization phase, I lean to solve it at the codegen phase. Actually the expressions are already reused at the optimization phase, you can see {{RexProgram}}, however, Flink doesn't utilize that, and will expand all the expressions again before codegen. There was a issue about the expression reusing in codegen, see FLINK-21573 > Support disabling project rewrite when multiple exprs in the project > reference the same sub project field. > ---------------------------------------------------------------------------------------------------------- > > Key: FLINK-33996 > URL: https://issues.apache.org/jira/browse/FLINK-33996 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime > Affects Versions: 1.18.0 > Reporter: Feng Jin > Priority: Major > Labels: pull-request-available > > When multiple top projects reference the same bottom project, project rewrite > rules may result in complex projects being calculated multiple times. > Take the following SQL as an example: > {code:sql} > create table test_source(a varchar) with ('connector'='datagen'); > explan plan for select a || 'a' as a, a || 'b' as b FROM (select > REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); > {code} > The final SQL plan is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) > +- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, > ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > {code} > It can be observed that after project write, regex_place is calculated twice. > Generally speaking, regular expression matching is a time-consuming operation > and we usually do not want it to be calculated multiple times. Therefore, for > this scenario, we can support disabling project rewrite. > After disabling some rules, the final plan we obtained is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) > +- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) > +- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a]) > +- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > {code} > After testing, we probably need to modify these few rules: > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectCalcMergeRule -- This message was sent by Atlassian Jira (v8.20.10#820010)