[ 
https://issues.apache.org/jira/browse/PIG-282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872363#action_12872363
 ] 

Aniket Mokashi commented on PIG-282:
------------------------------------

1. It is suitable to have PARTITION BY mapreduce.Partitioner than UDF. This 
will be followed by PARALLEL n.
2. Applicable to-
GROUP
COGROUP
CROSS
DISTINCT
JOIN (except 'skewed' which uses SkewedPartitioner)
3. ORDER partition by - not supported.
4. No check for validation of custom partitioners parameters 
(<PigNullableWritable, Writable>).

Approach- 
1. Added support for ClassType parsing and validation. Parsing for "partition 
by" is added to above mentioned clauses separately.
2. Custom Partitioner is stored as a String in LO, PO and MR plan. 
LogicalOperator holds the partitioner in LO plan. We add partitioner to 
POGlobalRearrangement as it decides the map-reduce boundary. We read and set 
the partitioner when we visit the POGlobalRearrangement.

Attaching a patch with initial changes...

> Custom Partitioner
> ------------------
>
>                 Key: PIG-282
>                 URL: https://issues.apache.org/jira/browse/PIG-282
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.7.0
>            Reporter: Amir Youssefi
>            Assignee: Aniket Mokashi
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> By adding custom partitioner we can give control over which output partition 
> a key (/value) goes to. We can add keywords to language e.g. 
> PARTITION BY UDF(...)
> or a similar syntax. UDF returns a number between 0 and n-1 where n is number 
> of output partitions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to