[ https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Viraj Bhat updated PIG-282: --------------------------- Release Note: This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> { //@Override public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { if(key.getValueAsPigType() instanceof Integer) { int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions); return ret; } else { return (key.hashCode()) % numPartitions; } } } was: This feature allows to specify Hadoop Partitioner for the following operations: GROUP/COGROUP, CROSS, DISTINCT, JOIN (except 'skewed' join). Partitioner controls the partitioning of the keys of the intermediate map-outputs. See http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Partitioner.html for more details. To use this feature you can add PARTITION BY clause to the appropriate operator: A = load 'input_data'; B = group A by $0 PARTITION BY org.apache.pig.test.utils.SimpleCustomPartitioner parallel 2; ..... Here is the code for SimpleCustomPartitioner public class SimpleCustomPartitioner extends Partitioner<PigNullableWritable, Writable> { //@Override public int getPartition(PigNullableWritable key, Writable value, int numPartitions) { if(key.getValueAsPigType() instanceof Integer) { int ret = (((Integer)key.getValueAsPigType()).intValue() % numPartitions); return ret; } else { return (key.hashCode()) % numPartitions; } } } > Custom Partitioner > ------------------ > > Key: PIG-282 > URL: https://issues.apache.org/jira/browse/PIG-282 > Project: Pig > Issue Type: New Feature > Affects Versions: 0.7.0 > Reporter: Amir Youssefi > Assignee: Aniket Mokashi > Priority: Minor > Fix For: 0.8.0 > > Attachments: CustomPartitioner.patch, CustomPartitionerFinale.patch, > CustomPartitionerTest.patch > > > By adding custom partitioner we can give control over which output partition > a key (/value) goes to. We can add keywords to language e.g. > PARTITION BY UDF(...) > or a similar syntax. UDF returns a number between 0 and n-1 where n is number > of output partitions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.