[ https://issues.apache.org/jira/browse/SPARK-20008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15936516#comment-15936516 ]
Xiao Li commented on SPARK-20008: --------------------------------- Sure, will do. > hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() returns > 1 > ------------------------------------------------------------------------------- > > Key: SPARK-20008 > URL: https://issues.apache.org/jira/browse/SPARK-20008 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.2, 2.2.0 > Reporter: Ravindra Bajpai > > hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() yields > 1 against expected 0. > This was not the case with spark 1.5.2. This is an api change from usage > point of view and hence I consider this as a bug. May be a boundary case, not > sure. > Work around - For now I check the counts != 0 before this operation. Not good > for performance. Hence creating a jira to track it. > As Young Zhang explained in reply to my mail - > Starting from Spark 2, these kind of operation are implemented in left anti > join, instead of using RDD operation directly. > Same issue also on sqlContext. > scala> spark.version > res25: String = 2.0.2 > spark.sqlContext.emptyDataFrame.except(spark.sqlContext.emptyDataFrame).explain(true) > == Physical Plan == > *HashAggregate(keys=[], functions=[], output=[]) > +- Exchange SinglePartition > +- *HashAggregate(keys=[], functions=[], output=[]) > +- BroadcastNestedLoopJoin BuildRight, LeftAnti, false > :- Scan ExistingRDD[] > +- BroadcastExchange IdentityBroadcastMode > +- Scan ExistingRDD[] > This arguably means a bug. But my guess is liking the logic of comparing NULL > = NULL, should it return true or false, causing this kind of confusion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org