[ https://issues.apache.org/jira/browse/SPARK-43098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-43098: ------------------------------- Labels: correctness (was: ) > Should not handle the COUNT bug when the GROUP BY clause of a correlated > scalar subquery is non-empty > ----------------------------------------------------------------------------------------------------- > > Key: SPARK-43098 > URL: https://issues.apache.org/jira/browse/SPARK-43098 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.2.0 > Reporter: Jack Chen > Assignee: Jack Chen > Priority: Major > Labels: correctness > Fix For: 3.4.1, 3.5.0 > > > From [~allisonwang-db] : > There is no COUNT bug when the correlated equality predicates are also in the > group by clause. However, the current logic to handle the COUNT bug still > adds default aggregate function value and returns incorrect results. > > {code:java} > create view t1(c1, c2) as values (0, 1), (1, 2); > create view t2(c1, c2) as values (0, 2), (0, 3); > select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from > t1; > -- Correct answer: [(0, 1, 2), (1, 2, null)] > +---+---+------------------+ > |c1 |c2 |scalarsubquery(c1)| > +---+---+------------------+ > |0 |1 |2 | > |1 |2 |0 | > +---+---+------------------+ > {code} > > This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but > lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: > https://issues.apache.org/jira/browse/SPARK-36113 > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org