[ 
https://issues.apache.org/jira/browse/SPARK-35080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-35080:
-----------------------------------

    Assignee: Allison Wang

> Correlated subqueries with equality predicates can return wrong results
> -----------------------------------------------------------------------
>
>                 Key: SPARK-35080
>                 URL: https://issues.apache.org/jira/browse/SPARK-35080
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Allison Wang
>            Assignee: Allison Wang
>            Priority: Major
>
> Correlated subqueries with aggregate that pass CheckAnalysis (with only 
> correlated equality predicates) can still return wrong results. This is 
> because equality predicates do not guarantee one-to-one mappings between 
> inner and outer attributes, and the semantics of the plan will be changed 
> when the inner attributes are pulled up through an Aggregate, which gives us 
> wrong results. Currently, the decorrelation framework does not support these 
> types of correlated subqueries, and they should be blocked in CheckAnalysis.
> Example 1:
> {code:sql}
> create or replace view t1(c) as values ('a'), ('b')
> create or replace view t2(c) as values ('ab'), ('abc'), ('bc')
> select c, (select count(*) from t2 where t1.c = substring(t2.c, 1, 1)) from t1
> {code}
> Correct results: [(a, 2), (b, 1)]
>  Spark results:
> {code:java}
> +---+-----------------+
> |c  |scalarsubquery(c)|
> +---+-----------------+
> |a  |1                |
> |a  |1                |
> |b  |1                |
> +---+-----------------+{code}
> Example 2:
> {code:sql}
> create or replace view t1(a, b) as values (0, 6), (1, 5), (2, 4), (3, 3);
> create or replace view t2(c) as values (6);
> select c, (select count(*) from t1 where a + b = c) from t2;{code}
> Correct results: [(6, 4)]
>  Spark results:
> {code:java}
> +---+-----------------+
> |c  |scalarsubquery(c)|
> +---+-----------------+
> |6  |1                |
> |6  |1                |
> |6  |1                |
> |6  |1                |
> +---+-----------------+
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to