[ https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294807#comment-15294807 ]
Davies Liu commented on SPARK-15441: ------------------------------------ How to we represent a null in Dataset? If it's a row with all columns are nulls, then we could transform a row with all columns are nulls into null, right? In this case, the left side of the fourth row are nulls, it could be null. > dataset outer join seems to return incorrect result > --------------------------------------------------- > > Key: SPARK-15441 > URL: https://issues.apache.org/jira/browse/SPARK-15441 > Project: Spark > Issue Type: Bug > Components: sq; > Reporter: Reynold Xin > Priority: Critical > > See notebook > https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2836020637783173/5382278320999420/latest.html > {code} > import org.apache.spark.sql.functions > val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS() > val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS() > // The last row _1 should be null, rather than (null, -1) > left.toDF("k", "v").as[(String, Int)].alias("left") > .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), > functions.col("left.k") === functions.col("right.k"), "right_outer") > .show() > {code} > The returned result currently is > {code} > +---------+-----+ > | _1| _2| > +---------+-----+ > | (a,2)|(a,x)| > | (a,1)|(a,x)| > | (b,3)|(b,y)| > |(null,-1)|(d,z)| > +---------+-----+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org