[ https://issues.apache.org/jira/browse/SPARK-48503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jack Chen updated SPARK-48503: ------------------------------ Parent: SPARK-35553 Issue Type: Sub-task (was: Bug) > Scalar subquery with group-by and non-equality predicate incorrectly allowed, > wrong results > ------------------------------------------------------------------------------------------- > > Key: SPARK-48503 > URL: https://issues.apache.org/jira/browse/SPARK-48503 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 4.0.0 > Reporter: Jack Chen > Priority: Major > > This query is not legal and should give an error, but instead we incorrectly > allow it and it returns wrong results. > {code:java} > create table x(x1 int, x2 int); > insert into x values (1, 1); > create table y(y1 int, y2 int); > insert into y values (2, 2), (3, 3); > select *, (select count(*) from y where y1 > x1 group by y1) from x; {code} > It returns two rows, even though there's only one row of x. > The correct result is an error: more than one row returned by a subquery used > as an expression (as seen in postgres for example) > > This is a longstanding bug. The bug is in CheckAnalysis in > {{{}checkAggregateInScalarSubquery{}}}. It allows grouping columns that are > present in correlation predicates, but doesn’t check whether those predicates > are equalities - because when that code was written, non-equality > correlation wasn’t allowed. Therefore, it looks like this bug has existed > since non-equality correlation was added (~2 years ago). > > Various other expressions that are not equi-joins between the inner and outer > fields hit this too, e.g. `where y1 + y2 = x1 group by y1`. > Another bugged case is if the correlation condition is an equality but it's > under another operator like an OUTER JOIN or UNION. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org