Varadharajan created SPARK-13393: ------------------------------------ Summary: Column mismatch issue in left_outer join using Spark DataFrame Key: SPARK-13393 URL: https://issues.apache.org/jira/browse/SPARK-13393 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Varadharajan
Consider the below snippet: {code:title=test.scala|borderStyle=solid} class Person(id: Int, name: String) val df = sc.parallelize(List( Person(1, "varadha"), Person(2, "nagaraj") )).toDF val varadha = df.filter("id = 1") val errorDF = df.join(varadha, df("id") === varadha("id"), "left_outer").select(df("id"), varadha("id") as "varadha_id") val nagaraj = df.filter("id = 2").select(df("id") as "n_id") val correctDF = df.join(nagaraj, df("id") === nagaraj("n_id"), "left_outer").select(df("id"), nagaraj("n_id") as "nagaraj_id") {code} The `errorDF` dataframe, after the left join is messed up and shows as below: | id|varadha_id| | 1| 1| | 2| 2 (*This should've been null*)| whereas correctDF has the correct output after the left join: | id|nagaraj_id| | 1| null| | 2| 2| -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org