[jira] [Assigned] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33536: --- Assignee: wuyi > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33536: Assignee: (was: Apache Spark) > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33536: Assignee: Apache Spark > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org