[ https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186652#comment-15186652 ]
Adrian Wang commented on SPARK-13393: ------------------------------------- In your example, df1("name") and df2("name") is exactly the same to each other, it's easy to throw an exception explicitly to tell user not to join 2 same dataframes without alias. We can do the same to this issue too. > Column mismatch issue in left_outer join using Spark DataFrame > -------------------------------------------------------------- > > Key: SPARK-13393 > URL: https://issues.apache.org/jira/browse/SPARK-13393 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Varadharajan > > Consider the below snippet: > {code:title=test.scala|borderStyle=solid} > case class Person(id: Int, name: String) > val df = sc.parallelize(List( > Person(1, "varadha"), > Person(2, "nagaraj") > )).toDF > val varadha = df.filter("id = 1") > val errorDF = df.join(varadha, df("id") === varadha("id"), > "left_outer").select(df("id"), varadha("id") as "varadha_id") > val nagaraj = df.filter("id = 2").select(df("id") as "n_id") > val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), > "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id") > {code} > The `errorDF` dataframe, after the left join is messed up and shows as below: > | id|varadha_id| > | 1| 1| > | 2| 2 (*This should've been null*)| > whereas correctDF has the correct output after the left join: > | id|nagaraj_id| > | 1| null| > | 2| 2| -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org