>
> On Thu, Jun 21, 2018 at 4:51 PM, Chawla,Sumit <[email protected]>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have been trying to this simple operation. I want to land all
>>>> values with one key in same partition, and not have any different key in
>>>> the same partition. Is this possible? I am getting b and c always
>>>> getting mixed up in the same partition.
>>>>
>>>>
>>>>
I think you could do something approsimately like:
val keys = rdd.map(_.getKey).distinct.zipWithIndex
val numKey = keys.map(_._2).count
rdd.map(r => (r.getKey, r)).join(keys).partitionBy(new Partitioner()
{def numPartitions=numKeys;def getPartition(key: Any) =
key.asInstanceOf[Long].toInt})
i.e., key by a unique number, count that, and repartition by key to the
exact count. This presumes, of course, that the number of keys is <MAXINT.
Also, I haven't tested this code, so don't take it as anything more than an
approximate idea, please :-)
-Nathan Kronenfeld