[ https://issues.apache.org/jira/browse/SPARK-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Herman van Hovell resolved SPARK-16991. --------------------------------------- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.1.0 2.0.1 > Full outer join followed by inner join produces wrong results > ------------------------------------------------------------- > > Key: SPARK-16991 > URL: https://issues.apache.org/jira/browse/SPARK-16991 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0 > Reporter: Jonas Jarutis > Assignee: Xiao Li > Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > I found strange behaviour using fullouter join in combination with inner > join. It seems that inner join can't match values correctly after full outer > join. Here is a reproducible example in spark 2.0. > {code} > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val a = Seq((1,2),(2,3)).toDF("a","b") > a: org.apache.spark.sql.DataFrame = [a: int, b: int] > scala> val b = Seq((2,5),(3,4)).toDF("a","c") > b: org.apache.spark.sql.DataFrame = [a: int, c: int] > scala> val c = Seq((3,1)).toDF("a","d") > c: org.apache.spark.sql.DataFrame = [a: int, d: int] > scala> val ab = a.join(b, Seq("a"), "fullouter") > ab: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> ab.show > +---+----+----+ > | a| b| c| > +---+----+----+ > | 1| 2|null| > | 3|null| 4| > | 2| 3| 5| > +---+----+----+ > scala> ab.join(c, "a").show > +---+---+---+---+ > | a| b| c| d| > +---+---+---+---+ > +---+---+---+---+ > {code} > Meanwhile, without the full outer, inner join works fine. > {code} > scala> b.join(c, "a").show > +---+---+---+ > | a| c| d| > +---+---+---+ > | 3| 4| 1| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org