[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238173#comment-17238173 ]
Apache Spark commented on SPARK-33536: -------------------------------------- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30488 > Incorrect join results when joining twice with the same DF > ---------------------------------------------------------- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0, 3.0.1, 3.1.0 > Reporter: wuyi > Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+---------+---+ > |key| value| e2| > +---+---------+---+ > | 1| sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+---------+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org