[ https://issues.apache.org/jira/browse/SPARK-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984195#comment-14984195 ]
Ram Kandasamy commented on SPARK-11427: --------------------------------------- So it looks like this issue has been resolved in spark version 1.5.1, I will mark this as duplicate as it was fixed in https://issues.apache.org/jira/browse/SPARK-10539. > DataFrame's intersect method does not work, returns 1 > ----------------------------------------------------- > > Key: SPARK-11427 > URL: https://issues.apache.org/jira/browse/SPARK-11427 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.5.0 > Reporter: Ram Kandasamy > > Hello, > I was working with dataframes and I found the intersect() method seems to > always return '1'. The RDD's intersection() method does work properly. > Consider this example: > scala> val firstFile = > sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct > firstFile: org.apache.spark.sql.DataFrame = [id: string] > scala> firstFile.count > res4: Long = 1072046 > scala> firstFile.intersect(firstFile).count > res5: Long = 1 > scala> firstFile.rdd.intersection(firstFile.rdd).count > res6: Long = 1072046 > I have tried various different cases, and for some reason, the dataframe's > intersect method always returns 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org