[ 
https://issues.apache.org/jira/browse/SPARK-32551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kanika dhuria updated SPARK-32551:
----------------------------------
    Description: 
Following code fails ambiguous self join analysis, even when it doesn't have 
self join 

val v1 = spark.range(3).toDF("m")
 val v2 = spark.range(3).toDF("d")
 val v3 = v1.join(v2, v1("m").===(v2("d")))
 val v4 = v3("d");
 val w1 = Window.partitionBy(v4)
 val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))

org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.;

 

  was:
Following code hits ambiguous join error even when it doesn't have self join 

val v1 = spark.range(3).toDF("m")
 val v2 = spark.range(3).toDF("d")
 val v3 = v1.join(v2, v1("m").===(v2("d")))
 val v4 = v3("d");
 val w1 = Window.partitionBy(v4)
 val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))

org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
probably because you joined several Datasets together, and some of these 
Datasets are the same. This column points to one of the Datasets but Spark is 
unable to figure out which one. Please alias the Datasets with different names 
via `Dataset.as` before joining them, and specify the column using qualified 
name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set 
spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.;

 


> Ambiguous self join error in non self join with window
> ------------------------------------------------------
>
>                 Key: SPARK-32551
>                 URL: https://issues.apache.org/jira/browse/SPARK-32551
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: kanika dhuria
>            Priority: Major
>
> Following code fails ambiguous self join analysis, even when it doesn't have 
> self join 
> val v1 = spark.range(3).toDF("m")
>  val v2 = spark.range(3).toDF("d")
>  val v3 = v1.join(v2, v1("m").===(v2("d")))
>  val v4 = v3("d");
>  val w1 = Window.partitionBy(v4)
>  val out = v3.select(v4.as("a"), sum(v4).over(w1).as("b"))
> org.apache.spark.sql.AnalysisException: Column a#45L are ambiguous. It's 
> probably because you joined several Datasets together, and some of these 
> Datasets are the same. This column points to one of the Datasets but Spark is 
> unable to figure out which one. Please alias the Datasets with different 
> names via `Dataset.as` before joining them, and specify the column using 
> qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You 
> can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable 
> this check.;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to