[jira] [Commented] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246360#comment-17246360 ] Apache Spark commented on SPARK-33536: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30682 > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.1.0 > > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246359#comment-17246359 ] Apache Spark commented on SPARK-33536: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30682 > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.1.0 > > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238174#comment-17238174 ] Apache Spark commented on SPARK-33536: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30488 > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33536) Incorrect join results when joining twice with the same DF
[ https://issues.apache.org/jira/browse/SPARK-33536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238173#comment-17238173 ] Apache Spark commented on SPARK-33536: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30488 > Incorrect join results when joining twice with the same DF > -- > > Key: SPARK-33536 > URL: https://issues.apache.org/jira/browse/SPARK-33536 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: wuyi >Priority: Major > > {code:java} > val emp1 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop"), > TestData(4, "IT")).toDS() > val emp2 = Seq[TestData]( > TestData(1, "sales"), > TestData(2, "personnel"), > TestData(3, "develop")).toDS() > val emp3 = emp1.join(emp2, emp1("key") === emp2("key")).select(emp1("*")) > emp1.join(emp3, emp1.col("key") === emp3.col("key"), > "left_outer").select(emp1.col("*"), emp3.col("key").as("e2")).show() > // wrong result > +---+-+---+ > |key|value| e2| > +---+-+---+ > | 1|sales| 1| > | 2|personnel| 2| > | 3| develop| 3| > | 4| IT| 4| > +---+-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org