[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341895#comment-14341895 ]
Marko Bonaci commented on SPARK-2620: ------------------------------------- *Spark 1.2 shell local:* {code:java} scala> case class P(name:String) defined class P scala> val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) ps: Array[P] = Array(P(alice), P(bob), P(charly), P(bob)) scala> sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect res8: Array[(P, Int)] = Array((P(alice),1), (P(charly),1), (P(bob),2)) {code} > case class cannot be used as key for reduce > ------------------------------------------- > > Key: SPARK-2620 > URL: https://issues.apache.org/jira/browse/SPARK-2620 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Affects Versions: 1.0.0, 1.1.0 > Environment: reproduced on spark-shell local[4] > Reporter: Gerard Maas > Assignee: Tobias Schlatter > Priority: Critical > Labels: case-class, core > > Using a case class as a key doesn't seem to work properly on Spark 1.0.0 > A minimal example: > case class P(name:String) > val ps = Array(P("alice"), P("bob"), P("charly"), P("bob")) > sc.parallelize(ps).map(x=> (x,1)).reduceByKey((x,y) => x+y).collect > [Spark shell local mode] res : Array[(P, Int)] = Array((P(bob),1), > (P(bob),1), (P(abe),1), (P(charly),1)) > In contrast to the expected behavior, that should be equivalent to: > sc.parallelize(ps).map(x=> (x.name,1)).reduceByKey((x,y) => x+y).collect > Array[(String, Int)] = Array((charly,1), (abe,1), (bob,2)) > groupByKey and distinct also present the same behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org