[ https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicholas Chammas updated SPARK-15441: ------------------------------------- Component/s: (was: sq;) SQL > dataset outer join seems to return incorrect result > --------------------------------------------------- > > Key: SPARK-15441 > URL: https://issues.apache.org/jira/browse/SPARK-15441 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Wenchen Fan > Priority: Critical > Fix For: 2.0.0 > > > See notebook > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2836020637783173/5382278320999420/latest.html > {code} > import org.apache.spark.sql.functions > val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS() > val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS() > // The last row _1 should be null, rather than (null, -1) > left.toDF("k", "v").as[(String, Int)].alias("left") > .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), > functions.col("left.k") === functions.col("right.k"), "right_outer") > .show() > {code} > The returned result currently is > {code} > +---------+-----+ > | _1| _2| > +---------+-----+ > | (a,2)|(a,x)| > | (a,1)|(a,x)| > | (b,3)|(b,y)| > |(null,-1)|(d,z)| > +---------+-----+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org