[jira] [Updated] (FLINK-38468) Performance drops when the coalesce parameter includes functions

Jie Cheng (Jira) Wed, 01 Oct 2025 10:11:21 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-38468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jie Cheng updated FLINK-38468:
------------------------------
    Description: 
I have two flink tasks, using 1.12 and 1.18 respectively. flinksql contains the 
COALESCE function. I found that the performance of 1.18 is not as good as that 
of 1.12

 

Use the sql:

select COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, '$.level_type'), 
'0') as level_type

 


1.12 Execution Plan
== Abstract Syntax Tree ==
LogicalProject(level_type=[CASE(IS NOT NULL($1), CAST($1):VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL, IS NOT NULL(GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type')), CAST(GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type')):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT 
NULL, _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")])
+- LogicalTableScan(table=[[default_catalog, default_database, shadow_source]])
 
== Optimized Logical Plan ==
Calc(select=[CASE(IS NOT NULL(level_type), CAST(level_type), IS NOT 
NULL(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
CAST(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
_UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])
 
== Physical Execution Plan ==
Stage 1 : Data Source
content : Source: TableSourceScan(table=[[default_catalog, default_database, 
shadow_source]], fields=[order_n_tuple, level_type])
 
Stage 2 : Operator
content : Calc(select=[(level_type IS NOT NULL CASE CAST(level_type) CASE 
(order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type') IS NOT NULL CASE 
CAST((order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type')) CASE 
_UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
ship_strategy : FORWARD
 
 
 
1.18 Execution Plan
== Abstract Syntax Tree ==
LogicalProject(level_type=[COALESCE($1, GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type'), _UTF-16LE'0')])
+- LogicalTableScan(table=[[default_catalog, default_database, shadow_source]])

== Optimized Physical Plan ==
Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
_UTF-16LE'$.level_type'), _UTF-16LE'0') AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])

== Optimized Execution Plan ==
Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
'$.level_type'), '0') AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])

 

 

  was:
I have two flink tasks, using 1.12 and 1.18 respectively. flinksql contains the 
COALESCE function. I found that the performance of 1.18 is not as good as that 
of 1.12

 

Use the sql:

select COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, '$.level_type'), 
'0') as level_type
1.12 Execution Plan
== Abstract Syntax Tree ==
LogicalProject(level_type=[CASE(IS NOT NULL($1), CAST($1):VARCHAR(2147483647) 
CHARACTER SET "UTF-16LE" NOT NULL, IS NOT NULL(GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type')), CAST(GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type')):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT 
NULL, _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")])
+- LogicalTableScan(table=[[default_catalog, default_database, shadow_source]])
 
== Optimized Logical Plan ==
Calc(select=[CASE(IS NOT NULL(level_type), CAST(level_type), IS NOT 
NULL(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
CAST(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
_UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])
 
== Physical Execution Plan ==
Stage 1 : Data Source
content : Source: TableSourceScan(table=[[default_catalog, default_database, 
shadow_source]], fields=[order_n_tuple, level_type])
 
Stage 2 : Operator
content : Calc(select=[(level_type IS NOT NULL CASE CAST(level_type) CASE 
(order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type') IS NOT NULL CASE 
CAST((order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type')) CASE 
_UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
ship_strategy : FORWARD
 
 
 
1.18 Execution Plan
== Abstract Syntax Tree ==
LogicalProject(level_type=[COALESCE($1, GET_JSON_OBJECT($0, 
_UTF-16LE'$.level_type'), _UTF-16LE'0')])
+- LogicalTableScan(table=[[default_catalog, default_database, shadow_source]])

== Optimized Physical Plan ==
Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
_UTF-16LE'$.level_type'), _UTF-16LE'0') AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])

== Optimized Execution Plan ==
Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
'$.level_type'), '0') AS level_type])
+- TableSourceScan(table=[[default_catalog, default_database, shadow_source]], 
fields=[order_n_tuple, level_type])

 

 


> Performance drops when the coalesce parameter includes functions
> ----------------------------------------------------------------
>
>                 Key: FLINK-38468
>                 URL: https://issues.apache.org/jira/browse/FLINK-38468
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.18.0
>            Reporter: Jie Cheng
>            Priority: Major
>             Fix For: 2.2.0
>
>
> I have two flink tasks, using 1.12 and 1.18 respectively. flinksql contains 
> the COALESCE function. I found that the performance of 1.18 is not as good as 
> that of 1.12
>  
> Use the sql:
> select COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, '$.level_type'), 
> '0') as level_type
>  
> 1.12 Execution Plan
> == Abstract Syntax Tree ==
> LogicalProject(level_type=[CASE(IS NOT NULL($1), CAST($1):VARCHAR(2147483647) 
> CHARACTER SET "UTF-16LE" NOT NULL, IS NOT NULL(GET_JSON_OBJECT($0, 
> _UTF-16LE'$.level_type')), CAST(GET_JSON_OBJECT($0, 
> _UTF-16LE'$.level_type')):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" NOT 
> NULL, _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")])
> +- LogicalTableScan(table=[[default_catalog, default_database, 
> shadow_source]])
>  
> == Optimized Logical Plan ==
> Calc(select=[CASE(IS NOT NULL(level_type), CAST(level_type), IS NOT 
> NULL(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
> CAST(GET_JSON_OBJECT(order_n_tuple, _UTF-16LE'$.level_type')), 
> _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
> +- TableSourceScan(table=[[default_catalog, default_database, 
> shadow_source]], fields=[order_n_tuple, level_type])
>  
> == Physical Execution Plan ==
> Stage 1 : Data Source
> content : Source: TableSourceScan(table=[[default_catalog, default_database, 
> shadow_source]], fields=[order_n_tuple, level_type])
>  
> Stage 2 : Operator
> content : Calc(select=[(level_type IS NOT NULL CASE CAST(level_type) CASE 
> (order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type') IS NOT NULL CASE 
> CAST((order_n_tuple GET_JSON_OBJECT _UTF-16LE'$.level_type')) CASE 
> _UTF-16LE'0':VARCHAR(2147483647) CHARACTER SET "UTF-16LE") AS level_type])
> ship_strategy : FORWARD
>  
>  
>  
> 1.18 Execution Plan
> == Abstract Syntax Tree ==
> LogicalProject(level_type=[COALESCE($1, GET_JSON_OBJECT($0, 
> _UTF-16LE'$.level_type'), _UTF-16LE'0')])
> +- LogicalTableScan(table=[[default_catalog, default_database, 
> shadow_source]])
> == Optimized Physical Plan ==
> Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
> _UTF-16LE'$.level_type'), _UTF-16LE'0') AS level_type])
> +- TableSourceScan(table=[[default_catalog, default_database, 
> shadow_source]], fields=[order_n_tuple, level_type])
> == Optimized Execution Plan ==
> Calc(select=[COALESCE(level_type, GET_JSON_OBJECT(order_n_tuple, 
> '$.level_type'), '0') AS level_type])
> +- TableSourceScan(table=[[default_catalog, default_database, 
> shadow_source]], fields=[order_n_tuple, level_type])
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-38468) Performance drops when the coalesce parameter includes functions

Reply via email to