[ https://issues.apache.org/jira/browse/SPARK-37932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478542#comment-17478542 ]
Feng Zhu edited comment on SPARK-37932 at 1/19/22, 10:30 AM: ------------------------------------------------------------- test {code:scala} test("SPARK-37932: view join view self with having filter") { withTable("t") { withView("v1") { Seq((2, "test2"), (3, "test3"), (1, "test1")).toDF("id", "name") .write.format("parquet").saveAsTable("t") sql("CREATE VIEW v1 (id, name) AS SELECT id, name FROM t") sql(""" |SELECT l1.id | FROM v1 l1 | INNER JOIN ( | SELECT id | FROM v1 | GROUP BY id | HAVING COUNT(DISTINCT name) > 1 | ) l2 | ON l1.id = l2.id | GROUP BY l1.name, l1.id; """.stripMargin) } } } {code} exception {code:java} org.apache.spark.sql.AnalysisException: Resolved attribute(s) name#25 missing from id#29,name#30 in operator !Aggregate [id#29], [id#29, count(distinct name#25) AS count(distinct name#25)#31L]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.; Aggregate [name#25, id#24], [id#24] +- Join Inner, (id#24 = id#29) :- SubqueryAlias l1 : +- SubqueryAlias spark_catalog.default.v1 : +- View (`default`.`v1`, [id#24,name#25]) : +- Project [cast(id#20 as int) AS id#24, cast(name#21 as string) AS name#25] : +- Project [id#20, name#21] : +- SubqueryAlias spark_catalog.default.t : +- Relation default.t[id#20,name#21] parquet +- SubqueryAlias l2 +- Project [id#29] +- Filter (count(distinct name#25)#31L > cast(1 as bigint)) +- !Aggregate [id#29], [id#29, count(distinct name#25) AS count(distinct name#25)#31L] +- SubqueryAlias spark_catalog.default.v1 +- View (`default`.`v1`, [id#29,name#30]) +- Project [cast(id#26 as int) AS id#29, cast(name#27 as string) AS name#30] +- Project [id#26, name#27] +- SubqueryAlias spark_catalog.default.t +- Relation default.t[id#26,name#27] parquet at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50) {code} was (Author: fishcus): test {code:scala} test("SPARK-37932: view join view self with having filter") { withTable("t") { withView("v1") { Seq((2, "test2"), (3, "test3"), (1, "test1")).toDF("id", "name") .write.format("parquet").saveAsTable("t") sql("CREATE VIEW v1 (id, name) AS SELECT id, name FROM t") sql(""" |SELECT l1.id | FROM v1 l1 | INNER JOIN ( | SELECT id | FROM v1 | GROUP BY id | HAVING COUNT(DISTINCT name) > 1 | ) l2 | ON l1.id = l2.id | GROUP BY l1.name, l1.id; """.stripMargin) } } } {code} exception {code} org.apache.spark.sql.AnalysisException: Resolved attribute(s) name#25 missing from id#29,name#30 in operator !Aggregate [id#29], [id#29, count(distinct name#25) AS count(distinct name#25)#31L]. Attribute(s) with the same name appear in the operation: name. Please check if the right attribute(s) are used.; Aggregate [name#25, id#24], [id#24] +- Join Inner, (id#24 = id#29) :- SubqueryAlias l1 : +- SubqueryAlias spark_catalog.default.v1 : +- View (`default`.`v1`, [id#24,name#25]) : +- Project [cast(id#20 as int) AS id#24, cast(name#21 as string) AS name#25] : +- Project [id#20, name#21] : +- SubqueryAlias spark_catalog.default.t : +- Relation default.t[id#20,name#21] parquet +- SubqueryAlias l2 +- Project [id#29] +- Filter (count(distinct name#25)#31L > cast(1 as bigint)) +- !Aggregate [id#29], [id#29, count(distinct name#25) AS count(distinct name#25)#31L] +- SubqueryAlias spark_catalog.default.v1 +- View (`default`.`v1`, [id#29,name#30]) +- Project [cast(id#26 as int) AS id#29, cast(name#27 as string) AS name#30] +- Project [id#26, name#27] +- SubqueryAlias spark_catalog.default.t +- Relation default.t[id#26,name#27] parquet at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:51) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:50) {code} > Analyzer can fail when join left side and right side are the same view > ---------------------------------------------------------------------- > > Key: SPARK-37932 > URL: https://issues.apache.org/jira/browse/SPARK-37932 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0 > Reporter: Feng Zhu > Priority: Major > Attachments: sql_and_exception > > > See the attachment for details, including SQL and the exception information. > * sql1, there is a normal filter (LO_SUPPKEY > 10) in the right side > subquery, Analyzer works as expected; > * sql2, there is a HAVING filter(HAVING COUNT(DISTINCT LO_SUPPKEY) > 1) in > the right side subquery, Analyzer failed with "Resolved attribute(s) > LO_SUPPKEY#337 missing ...". > From the debug info, the problem seems to be occurred after the rule > DeduplicateRelations is applied. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org