[ 
https://issues.apache.org/jira/browse/SPARK-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984195#comment-14984195
 ] 

Ram Kandasamy commented on SPARK-11427:
---------------------------------------

So it looks like this issue has been resolved in spark version 1.5.1, I will 
mark this as duplicate as it was fixed in 
https://issues.apache.org/jira/browse/SPARK-10539.

> DataFrame's intersect method does not work, returns 1
> -----------------------------------------------------
>
>                 Key: SPARK-11427
>                 URL: https://issues.apache.org/jira/browse/SPARK-11427
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Ram Kandasamy
>
> Hello,
>     I was working with dataframes and I found the intersect() method seems to 
> always return '1'. The RDD's intersection() method does work properly.
> Consider this example:
> scala> val firstFile = 
> sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
> firstFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> firstFile.count
> res4: Long = 1072046
> scala> firstFile.intersect(firstFile).count
> res5: Long = 1
> scala> firstFile.rdd.intersection(firstFile.rdd).count
> res6: Long = 1072046
> I have tried various different cases, and for some reason, the dataframe's 
> intersect method always returns 1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to