[ https://issues.apache.org/jira/browse/SPARK-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019199#comment-15019199 ]
Xiao Li commented on SPARK-11894: --------------------------------- The plan of Dataset: == Parsed Logical Plan == Project [struct(_1#2,_2#3) AS _1#10,struct(_1#7,_2#8) AS _2#11] Join Inner, Some(true) LocalRelation [_1#2,_2#3], [[1,0,1800000001,31],[0,16,1800000001,32]] LocalRelation [_1#7,_2#8], [[1,0,1800000001,31],[0,16,1800000001,32]] == Analyzed Logical Plan == _1: struct<_1:int,_2:string>, _2: struct<_1:int,_2:string> Project [struct(_1#2,_2#3) AS _1#10,struct(_1#7,_2#8) AS _2#11] Join Inner, Some(true) LocalRelation [_1#2,_2#3], [[1,0,1800000001,31],[0,16,1800000001,32]] LocalRelation [_1#7,_2#8], [[1,0,1800000001,31],[0,16,1800000001,32]] == Optimized Logical Plan == Project [struct(_1#2,_2#3) AS _1#10,struct(_1#7,_2#8) AS _2#11] Join Inner, None LocalRelation [_1#2,_2#3], [[1,0,1800000001,31],[0,16,1800000001,32]] LocalRelation [_1#7,_2#8], [[1,0,1800000001,31],[0,16,1800000001,32]] == Physical Plan == Project [struct(_1#2,_2#3) AS _1#10,struct(_1#7,_2#8) AS _2#11] BroadcastNestedLoopJoin BuildLeft, Inner, None LocalTableScan [_1#2,_2#3], [[1,0,1800000001,31],[0,16,1800000001,32]] LocalTableScan [_1#7,_2#8], [[1,0,1800000001,31],[0,16,1800000001,32]] > Incorrect results are returned when using null > ---------------------------------------------- > > Key: SPARK-11894 > URL: https://issues.apache.org/jira/browse/SPARK-11894 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.6.0 > Reporter: Xiao Li > > In DataSet APIs, the following two datasets are the same. > Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22), > "2")).toDS() > Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new > java.lang.Integer(22), "2")).toDS() > Note: java.lang.Integer is Nullable. > It could generate an incorrect result. For example, > val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS() > val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new > java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2) > val res1 = ds1.joinWith(ds2, lit(true)).collect() > The expected result should be > ((null,1),(null,1)) > ((22,2),(null,1)) > ((null,1),(22,2)) > ((22,2),(22,2)) > The actual result is > ((0,1),(0,1)) > ((22,2),(0,1)) > ((0,1),(22,2)) > ((22,2),(22,2)) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org