[ https://issues.apache.org/jira/browse/SPARK-43838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan resolved SPARK-43838. --------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 41347 [https://github.com/apache/spark/pull/41347] > Subquery on single table with having clause can't be optimized > -------------------------------------------------------------- > > Key: SPARK-43838 > URL: https://issues.apache.org/jira/browse/SPARK-43838 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.4.0 > Reporter: Jia Fan > Assignee: Jia Fan > Priority: Major > Fix For: 4.0.0 > > > Eg: > {code:java} > sql("create view t(c1, c2) as values (0, 1), (0, 2), (1, 2)") > sql("select c1, c2, (select count(*) cnt from t t2 where t1.c1 = t2.c1 " + > "having cnt = 0) from t t1").show() {code} > The error will throw: > {code:java} > [PLAN_VALIDATION_FAILED_RULE_IN_BATCH] Rule > org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery in > batch Operator Optimization before Inferring Filters generated an invalid > plan: The plan becomes unresolved: 'Project [toprettystring(c1#224, > Some(America/Los_Angeles)) AS toprettystring(c1)#238, toprettystring(c2#225, > Some(America/Los_Angeles)) AS toprettystring(c2)#239, > toprettystring(cnt#246L, Some(America/Los_Angeles)) AS > toprettystring(scalarsubquery(c1))#240] > +- 'Project [c1#224, c2#225, CASE WHEN isnull(alwaysTrue#245) THEN 0 WHEN NOT > (cnt#222L = 0) THEN null ELSE cnt#222L END AS cnt#246L] > +- 'Join LeftOuter, (c1#224 = c1#224#244) > :- Project [col1#226 AS c1#224, col2#227 AS c2#225] > : +- LocalRelation [col1#226, col2#227] > +- Project [cnt#222L, c1#224#244, cnt#222L, c1#224, true AS > alwaysTrue#245] > +- Project [cnt#222L, c1#224 AS c1#224#244, cnt#222L, c1#224] > +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224] > +- Project [col1#228 AS c1#224] > +- LocalRelation [col1#228, col2#229]The previous plan: > Project [toprettystring(c1#224, Some(America/Los_Angeles)) AS > toprettystring(c1)#238, toprettystring(c2#225, Some(America/Los_Angeles)) AS > toprettystring(c2)#239, toprettystring(scalar-subquery#223 [c1#224 && (c1#224 > = c1#224#244)], Some(America/Los_Angeles)) AS > toprettystring(scalarsubquery(c1))#240] > : +- Project [cnt#222L, c1#224 AS c1#224#244] > : +- Filter (cnt#222L = 0) > : +- Aggregate [c1#224], [count(1) AS cnt#222L, c1#224] > : +- Project [col1#228 AS c1#224] > : +- LocalRelation [col1#228, col2#229] > +- Project [col1#226 AS c1#224, col2#227 AS c2#225] > +- LocalRelation [col1#226, col2#227] {code} > > The reason are when execute subquery decorrelation, the fields in the > subquery but not in having clause are wrongly pull up. This problem only > occurs when there contain having clause. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org