[ 
https://issues.apache.org/jira/browse/SPARK-43760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Gubichev updated SPARK-43760:
------------------------------------
    Description: 
The following query:

 
{code:java}
select * from (
 select t1.id c1, (
  select t2.id c from range (1, 2) t2
  where t1.id = t2.id  ) c2
 from range (1, 3) t1 ) t
where t.c2 is not null
-- !query schema
struct<c1:bigint,c2:bigint>
-- !query output
1       1
2       NULL
 {code}
 
should return 1 row, because the second row is supposed to be removed by 
IsNotNull predicate. However, due to a wrong nullability propagation after 
subquery decorrelation, the output of the subquery is declared as not-nullable 
(incorrectly), so the predicate is constant folded into True.

  was:
The following query:

 
{{select *}}
{{from range(1, 3) t1}}
{{where (select sum(c) from (}}
{{ select t2.id * t2.id c}}
{{ from range (1, 2) t2 where t1.id = t2.id}}
{{ group by t2.id}}
{{ )}}
{{) is not null;}}
 
should return 1 row, because the second row is supposed to be removed by 
IsNotNull predicate. However, due to a wrong nullability propagation after 
subquery decorrelation, the output of the subquery is declared as not-nullable 
(incorrectly), so the predicate is constant folded into True.


> Incorrect attribute nullability after RewriteCorrelatedScalarSubquery leads 
> to incorrect query results
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-43760
>                 URL: https://issues.apache.org/jira/browse/SPARK-43760
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Andrey Gubichev
>            Priority: Major
>
> The following query:
>  
> {code:java}
> select * from (
>  select t1.id c1, (
>   select t2.id c from range (1, 2) t2
>   where t1.id = t2.id  ) c2
>  from range (1, 3) t1 ) t
> where t.c2 is not null
> -- !query schema
> struct<c1:bigint,c2:bigint>
> -- !query output
> 1     1
> 2     NULL
>  {code}
>  
> should return 1 row, because the second row is supposed to be removed by 
> IsNotNull predicate. However, due to a wrong nullability propagation after 
> subquery decorrelation, the output of the subquery is declared as 
> not-nullable (incorrectly), so the predicate is constant folded into True.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to