Hi,

I am using custom partitioner to partition my JavaPairRDD where key is a
String.

I use hashCode of the sub-string of the key to derive the partition index
but I have noticed that my partition contains keys which have a different
partitionIndex returned by the partitioner.

Another issue I am facing is that when I sort the rdd further after
partitioning, my partition has only keys which are equal.

My Partitioner is as below:

public class BlockPartitioner extends Partitioner {

private int numPartitions = 8;

@Override

public int numPartitions() {

return numPartitions;

}


@Override

public int getPartition(Object key) {

String dept = key.subString(0,7);

int partitionId = dept.hashCode();

return partitionId % numPartitions;

 }

}

I am using "foreachPartition" of the java pair rddd to verify my partitions.

Thanks
Ankur

Reply via email to