[jira] [Updated] (FLINK-33996) Support disabling project rewrite when multiple exprs in the project reference the same sub project field.
[ https://issues.apache.org/jira/browse/FLINK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Jin updated FLINK-33996: - Description: When multiple top projects reference the same bottom project, project rewrite rules may result in complex projects being calculated multiple times. Take the following SQL as an example: {code:sql} create table test_source(a varchar) with ('connector'='datagen'); explain plan for select a || 'a' as a, a || 'b' as b FROM (select REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); {code} The final SQL plan is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} It can be observed that after project write, regex_place is calculated twice. Generally speaking, regular expression matching is a time-consuming operation and we usually do not want it to be calculated multiple times. Therefore, for this scenario, we can support disabling project rewrite. After disabling some rules, the final plan we obtained is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} After testing, we probably need to modify these few rules: org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule org.apache.flink.table.planner.plan.rules.logical.FlinkProjectCalcMergeRule was: When multiple top projects reference the same bottom project, project rewrite rules may result in complex projects being calculated multiple times. Take the following SQL as an example: {code:sql} create table test_source(a varchar) with ('connector'='datagen'); explan plan for select a || 'a' as a, a || 'b' as b FROM (select REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); {code} The final SQL plan is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} It can be observed that after project write, regex_place is calculated twice. Generally speaking, regular expression matching is a time-consuming operation and we usually do not want it to be calculated multiple times. Therefore, for this scenario, we can support disabling project rewrite. After disabling some rules, the final plan we obtained is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(a, 'a') AS a, ||(a,
[jira] [Updated] (FLINK-33996) Support disabling project rewrite when multiple exprs in the project reference the same sub project field.
[ https://issues.apache.org/jira/browse/FLINK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-33996: --- Labels: pull-request-available (was: ) > Support disabling project rewrite when multiple exprs in the project > reference the same sub project field. > -- > > Key: FLINK-33996 > URL: https://issues.apache.org/jira/browse/FLINK-33996 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.18.0 >Reporter: Feng Jin >Priority: Major > Labels: pull-request-available > > When multiple top projects reference the same bottom project, project rewrite > rules may result in complex projects being calculated multiple times. > Take the following SQL as an example: > {code:sql} > create table test_source(a varchar) with ('connector'='datagen'); > explan plan for select a || 'a' as a, a || 'b' as b FROM (select > REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); > {code} > The final SQL plan is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) >+- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, > ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > {code} > It can be observed that after project write, regex_place is calculated twice. > Generally speaking, regular expression matching is a time-consuming operation > and we usually do not want it to be calculated multiple times. Therefore, for > this scenario, we can support disabling project rewrite. > After disabling some rules, the final plan we obtained is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) >+- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) >+- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a]) >+- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > {code} > After testing, we probably need to modify these few rules: > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectCalcMergeRule -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33996) Support disabling project rewrite when multiple exprs in the project reference the same sub project field.
[ https://issues.apache.org/jira/browse/FLINK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Jin updated FLINK-33996: - Description: When multiple top projects reference the same bottom project, project rewrite rules may result in complex projects being calculated multiple times. Take the following SQL as an example: {code:sql} create table test_source(a varchar) with ('connector'='datagen'); explan plan for select a || 'a' as a, a || 'b' as b FROM (select REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); {code} The final SQL plan is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} It can be observed that after project write, regex_place is calculated twice. Generally speaking, regular expression matching is a time-consuming operation and we usually do not want it to be calculated multiple times. Therefore, for this scenario, we can support disabling project rewrite. After disabling some rules, the final plan we obtained is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} After testing, we probably need to modify these few rules: org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule org.apache.flink.table.planner.plan.rules.logical.FlinkProjectCalcMergeRule was: When multiple top projects reference the same bottom project, project rewrite rules may result in complex projects being calculated multiple times. Take the following SQL as an example: {code:sql} create table test_source(a varchar) with ('connector'='datagen'); explan plan for select a || 'a' as a, a || 'b' as b FROM (select REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); {code} The final SQL plan is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), _UTF-16LE'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) {code} It can be observed that after project write, regex_place is calculated twice. Generally speaking, regular expression matching is a time-consuming operation and we usually do not want it to be calculated multiple times. Therefore, for this scenario, we can support disabling project rewrite. After disabling some rules, the final plan we obtained is as follows: {code:sql} == Abstract Syntax Tree == LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) +- LogicalTableScan(table=[[default_catalog, default_database, test_source]]) == Optimized Physical Plan == Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) +- TableSourceScan(table=[[default_catalog, default_database, test_source]], fields=[a]) == Optimized Execution Plan == Calc(select=[||(a, 'a') AS a,
[jira] [Updated] (FLINK-33996) Support disabling project rewrite when multiple exprs in the project reference the same sub project field.
[ https://issues.apache.org/jira/browse/FLINK-33996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Jin updated FLINK-33996: - Summary: Support disabling project rewrite when multiple exprs in the project reference the same sub project field. (was: Support disabling project rewrite when multiple exprs in the project reference the same project.) > Support disabling project rewrite when multiple exprs in the project > reference the same sub project field. > -- > > Key: FLINK-33996 > URL: https://issues.apache.org/jira/browse/FLINK-33996 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Runtime >Affects Versions: 1.18.0 >Reporter: Feng Jin >Priority: Major > > When multiple top projects reference the same bottom project, project rewrite > rules may result in complex projects being calculated multiple times. > Take the following SQL as an example: > {code:sql} > create table test_source(a varchar) with ('connector'='datagen'); > explan plan for select a || 'a' as a, a || 'b' as b FROM (select > REGEXP_REPLACE(a, 'aaa', 'bbb') as a FROM test_source); > {code} > The final SQL plan is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) >+- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'a') AS a, ||(REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb'), > _UTF-16LE'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'a') AS a, > ||(REGEXP_REPLACE(a, 'aaa', 'bbb'), 'b') AS b]) > +- TableSourceScan(table=[[default_catalog, default_database, test_source]], > fields=[a]) > {code} > It can be observed that after project write, regex_place is calculated twice. > Generally speaking, regular expression matching is a time-consuming operation > and we usually do not want it to be calculated multiple times. Therefore, for > this scenario, we can support disabling project rewrite. > After disabling some rules, the final plan we obtained is as follows: > {code:sql} > == Abstract Syntax Tree == > LogicalProject(a=[||($0, _UTF-16LE'a')], b=[||($0, _UTF-16LE'b')]) > +- LogicalProject(a=[REGEXP_REPLACE($0, _UTF-16LE'aaa', _UTF-16LE'bbb')]) >+- LogicalTableScan(table=[[default_catalog, default_database, > test_source]]) > == Optimized Physical Plan == > Calc(select=[||(a, _UTF-16LE'a') AS a, ||(a, _UTF-16LE'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, _UTF-16LE'aaa', _UTF-16LE'bbb') AS a]) >+- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > == Optimized Execution Plan == > Calc(select=[||(a, 'a') AS a, ||(a, 'b') AS b]) > +- Calc(select=[REGEXP_REPLACE(a, 'aaa', 'bbb') AS a]) >+- TableSourceScan(table=[[default_catalog, default_database, > test_source]], fields=[a]) > {code} > After testing, we probably need to modify these few rules: > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkCalcMergeRule > org.apache.flink.table.planner.plan.rules.logical.FlinkProjectMergeRule -- This message was sent by Atlassian Jira (v8.20.10#820010)