I'd like to override the logic of comparing keys for equality in groupByKey. Kinda like how combineByKey allows you to pass in the combining logic for "values", I'd like to do the same for keys.
My code looks like this: val res = rdd.groupBy(myPartitioner) Here, rdd is of type RDD[(MyKey, MyValue)], so res turns out to be of type RDD[(MyKey, Seq[MyValue])] MyKey is defined as case class MyKey(field1: Int, field2: Int) and myPartitioner's getPartition(key: Any), here key is of type MyKey and the partitioning logic is an expression on both field1 and field2. I'm guessing the groupBy uses "equals" to compare like instances of MyKey. Currently, the "equals" method of MyKey uses both field1 and field2, as would be natural to its implementation. However, I'd like to have the groupBy only use field1. Any pointers on how I can go about doing it? One way is the following, but I'd like to avoid creating all those MyNewKey objects: val partitionedRdd = rdd.partitionBy(myPartitioner) val mappedRdd = partitionedRdd.mapPartitions(partition => partition.map(case (myKey, myValue) => (new MyNewKey(myKey.field1), myValue)), preservesPartitioning=true) val groupedRdd = mappedRdd.groupByKey() Thanks, Ameet