[ https://issues.apache.org/jira/browse/SPARK-16316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365810#comment-15365810 ]
Hyukjin Kwon commented on SPARK-16316: -------------------------------------- I confirm this works fine in current master {code} scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j") dfa: org.apache.spark.sql.DataFrame = [i: int, j: int] scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j") dfb: org.apache.spark.sql.DataFrame = [i: int, j: int] scala> dfa.except(dfb).count res0: Long = 90 {code} > dataframe except API returning wrong result in spark 1.5.0 > ---------------------------------------------------------- > > Key: SPARK-16316 > URL: https://issues.apache.org/jira/browse/SPARK-16316 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Jacky Li > > Version: spark 1.5.0 > Use case: use except API to do subtract between two dataframe > scala> val dfa = sc.parallelize(1 to 100).map(x => (x, x)).toDF("i", "j") > dfa: org.apache.spark.sql.DataFrame = [i: int, j: int] > scala> val dfb = sc.parallelize(1 to 10).map(x => (x, x)).toDF("i", "j") > dfb: org.apache.spark.sql.DataFrame = [i: int, j: int] > scala> dfa.except(dfb).count > res13: Long = 0 > It should return 90 instead of 0 > While following statement works fine > scala> dfa.except(dfb).rdd.count > res13: Long = 90 > I guess the bug maybe somewhere in dataframe.count -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org