[jira] [Comment Edited] (SPARK-2620) case class cannot be used as key for reduce

Aaron (JIRA) Tue, 22 Jul 2014 11:07:09 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070604#comment-14070604
 ]


Aaron edited comment on SPARK-2620 at 7/22/14 6:05 PM:
-------------------------------------------------------

If you look at the diff of distinct from branch-0.9 to master you see  
   -  def distinct(numPartitions: Int): RDD[T] =
   +  def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): 
RDD[T] =  

Is it possible that case classes don't have an implicit ordering and that is 
why this fails?


was (Author: aaronjosephs):
If you look at the diff of distinct from branch-0.9 to master you see  
-  def distinct(numPartitions: Int): RDD[T] =
+  def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = 
 

Is it possible that case classes don't have an implicit ordering and that is 
why this fails?

> case class cannot be used as key for reduce
> -------------------------------------------
>
>                 Key: SPARK-2620
>                 URL: https://issues.apache.org/jira/browse/SPARK-2620
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>         Environment: reproduced on spark-shell local[4]
>            Reporter: Gerard Maas
>            Priority: Critical
>              Labels: case-class, core
>
> Using a case class as a key doesn't seem to work properly on Spark 1.0.0
> A minimal example:
> case class P(name:String)
> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob"))
> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect
> [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), 
> (P(bob),1), (P(abe),1), (P(charly),1))
> In contrast to the expected behavior, that should be equivalent to:
> sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect
> Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2))
> groupByKey and distinct also present the same behavior.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (SPARK-2620) case class cannot be used as key for reduce

Reply via email to