[ https://issues.apache.org/jira/browse/SPARK-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-4459: ------------------------------ Assignee: Alok Saldanha > JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors > ---------------------------------------------------------------------------- > > Key: SPARK-4459 > URL: https://issues.apache.org/jira/browse/SPARK-4459 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 1.0.0, 1.2.0, 1.1.2 > Reporter: Alok Saldanha > Assignee: Alok Saldanha > Fix For: 1.1.1, 1.1.2 > > > I believe this issue is essentially the same as SPARK-668. > Original error: > {code} > [ERROR] > /Users/saldaal1/workspace/JavaSparkSimpleApp/src/main/java/SimpleApp.java:[29,105] > no suitable method found for > groupBy(org.apache.spark.api.java.function.Function<scala.Tuple2<java.lang.String,java.lang.Long>,java.lang.Long>) > [ERROR] method > org.apache.spark.api.java.JavaPairRDD.<K>groupBy(org.apache.spark.api.java.function.Function<scala.Tuple2<K,java.lang.Long>,K>) > is not applicable > [ERROR] (inferred type does not conform to equality constraint(s) > {code} > from core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala > {code} > 211 /** > 212 * Return an RDD of grouped elements. Each group consists of a key and > a sequence of elements > 213 * mapping to that key. > 214 */ > 215 def groupBy[K](f: JFunction[T, K]): JavaPairRDD[K, JIterable[T]] = { > 216 implicit val ctagK: ClassTag[K] = fakeClassTag > 217 implicit val ctagV: ClassTag[JList[T]] = fakeClassTag > 218 JavaPairRDD.fromRDD(groupByResultToJava(rdd.groupBy(f)(fakeClassTag))) > 219 } > {code} > Then in core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala: > {code} > 45 class JavaPairRDD[K, V](val rdd: RDD[(K, V)]) > 46 (implicit val kClassTag: ClassTag[K], implicit > val vClassTag: ClassTag[V]) > 47 extends JavaRDDLike[(K, V), JavaPairRDD[K, V]] { > {code} > The problem is that the type parameter T in JavaRDDLike is Tuple2[K,V], which > means the combined signature for groupBy in the JavaPairRDD is > {code} > groupBy[K](f: JFunction[Tuple2[K,V], K]) > {code} > which imposes an unfortunate correlation between the Tuple2 and the return > type of the grouping function, namely that the return type of the grouping > function must be the same as the first type of the JavaPairRDD. > If we compare the method signature to flatMap: > {code} > 105 /** > 106 * Return a new RDD by first applying a function to all elements of > this > 107 * RDD, and then flattening the results. > 108 */ > 109 def flatMap[U](f: FlatMapFunction[T, U]): JavaRDD[U] = { > 110 import scala.collection.JavaConverters._ > 111 def fn = (x: T) => f.call(x).asScala > 112 JavaRDD.fromRDD(rdd.flatMap(fn)(fakeClassTag[U]))(fakeClassTag[U]) > 113 } > {code} > we see there should be an easy fix by changing the type parameter of the > groupBy function from K to U. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org