[jira] [Comment Edited] (FLINK-31830) Coalesce on nested fields with different nullabilities will get wrong plan

Jane Chan (Jira) Fri, 22 Dec 2023 01:31:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-31830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799743#comment-17799743
 ]


Jane Chan edited comment on FLINK-31830 at 12/22/23 9:30 AM:
-------------------------------------------------------------

Hi [~Sergey Nuyanzin], sorry for the late reply. 
{quote}{{org.apache.flink.table.api.DataTypes#ROW(org.apache.flink.table.api.DataTypes.Field...)}}
 in a way that if at least any fields is not null then set initial nullability 
of the record to {{{}false{}}}... Would it work in such a way?
{quote}
If I understand correctly, you want to implicitly set the record's level of 
nullability to not null, if at least one of the inner fields is not null.

I suppose you want to adopt a practice similar to Calcite's by quietly altering 
the nullability, except that you suggest taking an approach opposite to Calcite 
regarding whose nullability to prioritize. (Calcite determines that if the 
record level is nullable, it will change the nullability of the fields within 
to true.)

I understand that it may be helpful for maintaining the connector in the short 
term. However, in the long run, this inconsistency in priority does not 
facilitate the unified maintenance of SQL and TableAPI.


was (Author: qingyue):
Hi [~Sergey Nuyanzin], sorry for the late reply. 
{quote}{{org.apache.flink.table.api.DataTypes#ROW(org.apache.flink.table.api.DataTypes.Field...)}}
 in a way that if at least any fields is not null then set initial nullability 
of the record to {{{}false{}}}... Would it work in such a way?
{quote}
If I understand correctly, you want to implicitly ignore the record's level of 
nullability if at least one of the inner fields is not null.

I suppose you want to adopt a practice similar to Calcite's by quietly altering 
the nullability, except that you suggest taking an approach opposite to Calcite 
regarding whose nullability to prioritize. (Calcite determines that if the 
record level is nullable, it will change the nullability of the fields within 
to true.)

I understand that it may be helpful for maintaining the connector in the short 
term. However, in the long run, this inconsistency in priority does not 
facilitate the unified maintenance of SQL and TableAPI.

> Coalesce on nested fields with different nullabilities will get wrong plan
> --------------------------------------------------------------------------
>
>                 Key: FLINK-31830
>                 URL: https://issues.apache.org/jira/browse/FLINK-31830
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.14.6
>            Reporter: lincoln lee
>            Assignee: Jane Chan
>            Priority: Major
>         Attachments: image-2023-06-09-15-06-01-322.png, 
> image-2023-06-09-15-21-13-720.png
>
>
> A test case similar to FLINK-31829, only changes the nullable field `a.np` to 
> not null, will get a wrong plan in 1.14.x (reported from the community user):
> {code}
>   @Test
>   def testCoalesceOnNestedColumns(): Unit = {
>     val tEnv = util.tableEnv
>     val tableDescriptor = TableDescriptor.forConnector("datagen")
>         .schema(Schema.newBuilder
>             .column("id", DataTypes.INT.notNull)
>             .column("a", DataTypes.ROW(DataTypes.FIELD("np", 
> DataTypes.INT.notNull())).nullable)
>             .build)
>         .build
>     tEnv.createTemporaryTable("t1", tableDescriptor)
>     tEnv.createTemporaryTable("t2", tableDescriptor)
>     val res = tEnv.executeSql("EXPLAIN SELECT a.id, COALESCE(a.a.np, b.a.np) 
> c1, IFNULL(a.a.np, b.a.np) c2 FROM t1 a left JOIN t2 b ON a.id=b.id where a.a 
> is null or a.a.np is null")
>     res.print()
> }  
> == Abstract Syntax Tree ==
> LogicalProject(id=[$0], c1=[CAST($1.np):INTEGER], c2=[IFNULL($1.np, $3.np)])
> +- LogicalFilter(condition=[OR(IS NULL($1), IS NULL(CAST($1.np):INTEGER))])
>    +- LogicalJoin(condition=[=($0, $2)], joinType=[left])
>       :- LogicalTableScan(table=[[default_catalog, default_database, t1]])
>       +- LogicalTableScan(table=[[default_catalog, default_database, t2]])
> {code}
> the top project in the ast is wrong:  `LogicalProject(id=[$0], 
> c1=[CAST($1.np):INTEGER], c2=[IFNULL($1.np, $3.np)])`, the 
> `c1=[CAST($1.np):INTEGER]` relate to `COALESCE(a.a.np, b.a.np) c1` is 
> incorrect,
> but this works fine when using sql ddl to create tables
> {code}
>   @Test
>   def testCoalesceOnNestedColumns2(): Unit = {
>     val tEnv = util.tableEnv
>     tEnv.executeSql(
>       s"""
>          |create temporary table t1 (
>          |  id int not null,
>          |  a row<np int not null>
>          |) with (
>          | 'connector' = 'datagen'
>          |)
>          |""".stripMargin)
>     tEnv.executeSql(
>       s"""
>          |create temporary table t2 (
>          |  id int not null,
>          |  a row<np int not null>
>          |) with (
>          | 'connector' = 'datagen'
>          |)
>          |""".stripMargin)
>     val res = tEnv.executeSql(
>       "EXPLAIN SELECT a.id, COALESCE(a.a.np, b.a.np) c1, IFNULL(a.a.np, 
> b.a.np) c2 FROM t1 a left JOIN t2 b ON a.id=b.id where a.a is null or a.a.np 
> is null")
>     res.print()
>   }
> {code}
> from 1.15, the coalesce will be a new builtin function, and the ast looks 
> correct in version 1.15+, while before 1.15 it was rewritten as `case when`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-31830) Coalesce on nested fields with different nullabilities will get wrong plan

Reply via email to