When my tuple type includes a generic type parameter, the pair RDD
functions aren't available. Take for example the following (a join on two
RDDs, taking the sum of the values):
def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) :
RDD[(String, Int)] = {
rddA.join(rddB).map {
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag
def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) :
RDD[(K, Int)] = {
rddA.join(rddB).map { case (k, (a, b)) = (k, a+b) }
}
On Tue, Apr 1, 2014 at 4:55 PM, Daniel
Koert's answer is very likely correct. This implicit definition which
converts an RDD[(K, V)] to provide PairRDDFunctions requires a ClassTag is
available for K:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1124
To fully understand what's