[ https://issues.apache.org/jira/browse/SPARK-23855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437327#comment-16437327 ]
Erik Selin commented on SPARK-23855: ------------------------------------ +1, from our investigations it looks like we've also hit this issue on 2.2 > Performing a Join after a CrossJoin can lead to data corruption > --------------------------------------------------------------- > > Key: SPARK-23855 > URL: https://issues.apache.org/jira/browse/SPARK-23855 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0, 2.2.1 > Reporter: Martin Junghanns > Priority: Major > > The following tests produces the wrong result for the join operation. The > error only occurs when joining on the first column of the crossed dataframe. > However, a subsequent select fixes the data (which is of course not a > solution). > It works on 2.3.0 though. It would be nice to get this fixed on the 2.2.x > releases, too. Maybe someone can point me to the issue that has been fixed? > Would be nice to see the solution in code. > {code} > it("should correctly perform a join after a cross") { > val df1 = sparkSession.createDataFrame(Seq(Tuple1(0L))) > .toDF("a") > val df2 = sparkSession.createDataFrame(Seq(Tuple1(1L))) > .toDF("b") > val df3 = sparkSession.createDataFrame(Seq(Tuple1(0L))) > .toDF("c") > val cross = df1.crossJoin(df2) > cross.show() > val joined = cross > .join(df3, cross.col("a") === df3.col("c")) > joined.show() > val selected = joined.select("*") > selected.show > } > {code} > prints: > {code:java} > +---+---+ > | a| b| > +---+---+ > | 0| 1| > +---+---+ > +---+---+---+ > | a| b| c| > +---+---+---+ > | 0| 0| 1| > +---+---+---+ > +---+---+---+ > | a| b| c| > +---+---+---+ > | 0| 1| 0| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org