[ https://issues.apache.org/jira/browse/SPARK-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357553#comment-15357553 ]
Xiao Li commented on SPARK-16316: --------------------------------- How about 1.6 and 2.0? > dataframe except API returning wrong result in spark 1.5.0 > ---------------------------------------------------------- > > Key: SPARK-16316 > URL: https://issues.apache.org/jira/browse/SPARK-16316 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Jacky Li > > Version: spark 1.5.0 > Use case: use except API to do subtract between two dataframe > scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j") > dfa: org.apache.spark.sql.DataFrame = [i: int, j: int] > scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j") > dfb: org.apache.spark.sql.DataFrame = [i: int, j: int] > scala> dfa.except(dfb).count > res13: Long = 0 > It should return 90 instead of 0 > While following statement works fine > scala> dfa.except(dfb).rdd.count > res13: Long = 90 > I guess the bug maybe somewhere in dataframe.count -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org