[ https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yin Huai updated SPARK-13393: ----------------------------- Priority: Critical (was: Major) > Column mismatch issue in left_outer join using Spark DataFrame > -------------------------------------------------------------- > > Key: SPARK-13393 > URL: https://issues.apache.org/jira/browse/SPARK-13393 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Varadharajan > Priority: Critical > > Consider the below snippet: > {code:title=test.scala|borderStyle=solid} > case class Person(id: Int, name: String) > val df = sc.parallelize(List( > Person(1, "varadha"), > Person(2, "nagaraj") > )).toDF > val varadha = df.filter("id = 1") > val errorDF = df.join(varadha, df("id") === varadha("id"), > "left_outer").select(df("id"), varadha("id") as "varadha_id") > val nagaraj = df.filter("id = 2").select(df("id") as "n_id") > val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), > "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id") > {code} > The `errorDF` dataframe, after the left join is messed up and shows as below: > | id|varadha_id| > | 1| 1| > | 2| 2 (*This should've been null*)| > whereas correctDF has the correct output after the left join: > | id|nagaraj_id| > | 1| null| > | 2| 2| -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org