[ 
https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15297881#comment-15297881
 ] 

Zhan Zhang commented on SPARK-15441:
------------------------------------

Currently new GenericInternalRow(right.output.length) is used as nullRow, but 
actually it cannot be used to identify the difference of row itself is null or 
all columns are null. Probably we can add a special row nullRow to represent 
that the InternalRow itself is null, so that Encoder can identify whether the 
object itself is null or not. 

> dataset outer join seems to return incorrect result
> ---------------------------------------------------
>
>                 Key: SPARK-15441
>                 URL: https://issues.apache.org/jira/browse/SPARK-15441
>             Project: Spark
>          Issue Type: Bug
>          Components: sq;
>            Reporter: Reynold Xin
>            Assignee: Wenchen Fan
>            Priority: Critical
>
> See notebook
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2836020637783173/5382278320999420/latest.html
> {code}
> import org.apache.spark.sql.functions
> val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS()
> val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS()
> // The last row _1 should be null, rather than (null, -1)
> left.toDF("k", "v").as[(String, Int)].alias("left")
>   .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), 
> functions.col("left.k") === functions.col("right.k"), "right_outer")
>   .show()
> {code}
> The returned result currently is
> {code}
> +---------+-----+
> |       _1|   _2|
> +---------+-----+
> |    (a,2)|(a,x)|
> |    (a,1)|(a,x)|
> |    (b,3)|(b,y)|
> |(null,-1)|(d,z)|
> +---------+-----+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to