[ https://issues.apache.org/jira/browse/SPARK-35080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-35080: ----------------------------------- Assignee: Allison Wang > Correlated subqueries with equality predicates can return wrong results > ----------------------------------------------------------------------- > > Key: SPARK-35080 > URL: https://issues.apache.org/jira/browse/SPARK-35080 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Allison Wang > Assignee: Allison Wang > Priority: Major > > Correlated subqueries with aggregate that pass CheckAnalysis (with only > correlated equality predicates) can still return wrong results. This is > because equality predicates do not guarantee one-to-one mappings between > inner and outer attributes, and the semantics of the plan will be changed > when the inner attributes are pulled up through an Aggregate, which gives us > wrong results. Currently, the decorrelation framework does not support these > types of correlated subqueries, and they should be blocked in CheckAnalysis. > Example 1: > {code:sql} > create or replace view t1(c) as values ('a'), ('b') > create or replace view t2(c) as values ('ab'), ('abc'), ('bc') > select c, (select count(*) from t2 where t1.c = substring(t2.c, 1, 1)) from t1 > {code} > Correct results: [(a, 2), (b, 1)] > Spark results: > {code:java} > +---+-----------------+ > |c |scalarsubquery(c)| > +---+-----------------+ > |a |1 | > |a |1 | > |b |1 | > +---+-----------------+{code} > Example 2: > {code:sql} > create or replace view t1(a, b) as values (0, 6), (1, 5), (2, 4), (3, 3); > create or replace view t2(c) as values (6); > select c, (select count(*) from t1 where a + b = c) from t2;{code} > Correct results: [(6, 4)] > Spark results: > {code:java} > +---+-----------------+ > |c |scalarsubquery(c)| > +---+-----------------+ > |6 |1 | > |6 |1 | > |6 |1 | > |6 |1 | > +---+-----------------+ > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org