[jira] [Commented] (SPARK-15441) dataset outer join seems to return incorrect result

Davies Liu (JIRA) Sat, 21 May 2016 00:07:25 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294807#comment-15294807
 ]


Davies Liu commented on SPARK-15441:
------------------------------------

How to we represent a null in Dataset? If it's a row with all columns are 
nulls, then we could transform a row with all columns are nulls into null, 
right? In this case, the left side of the fourth row are nulls, it could be 
null.

> dataset outer join seems to return incorrect result
> ---------------------------------------------------
>
>                 Key: SPARK-15441
>                 URL: https://issues.apache.org/jira/browse/SPARK-15441
>             Project: Spark
>          Issue Type: Bug
>          Components: sq;
>            Reporter: Reynold Xin
>            Priority: Critical
>
> See notebook
> https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/6122906529858466/2836020637783173/5382278320999420/latest.html
> {code}
> import org.apache.spark.sql.functions
> val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS()
> val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS()
> // The last row _1 should be null, rather than (null, -1)
> left.toDF("k", "v").as[(String, Int)].alias("left")
>   .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), 
> functions.col("left.k") === functions.col("right.k"), "right_outer")
>   .show()
> {code}
> The returned result currently is
> {code}
> +---------+-----+
> |       _1|   _2|
> +---------+-----+
> |    (a,2)|(a,x)|
> |    (a,1)|(a,x)|
> |    (b,3)|(b,y)|
> |(null,-1)|(d,z)|
> +---------+-----+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15441) dataset outer join seems to return incorrect result

Reply via email to