[ https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172586#comment-17172586 ]
kanika dhuria commented on SPARK-32551: --------------------------------------- Thanks [~cloud_fan], it is fixed in latest 3.0 branch. Fixed as part of https://issues.apache.org/jira/browse/SPARK-31956. > Ambiguous self join error in non self join with window > ------------------------------------------------------ > > Key: SPARK-32551 > URL: https://issues.apache.org/jira/browse/SPARK-32551 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.0.0 > Reporter: kanika dhuria > Priority: Major > > Following code fails ambiguous self join analysis, even when it doesn't have > self join > val v1 = spark.range(3).toDF("m") > val v2 = spark.range(3).toDF("d") > val v3 = v1.join(v2, v1("m").===(v2("d"))) > val v4 = v3("d"); > val w1 = Window.partitionBy(v4) > val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b")) > org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's > probably because you joined several Datasets together, and some of these > Datasets are the same. This column points to one of the Datasets but Spark is > unable to figure out which one. Please alias the Datasets with different > names via `Dataset.as` before joining them, and specify the column using > qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You > can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable > this check.; > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org