[ 
https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855204#action_12855204
 ] 

Doug Cutting commented on AVRO-513:
-----------------------------------

To implement this, we can specify a Hadoop grouping comparator that always 
returns true, so Hadoop's reduce method is called with a sorted iterator over 
all items in the partition.  Then we'd can wrap this in an iterator that keeps 
a copy of the previous item and whose #hasNext() implementation returns false 
when the previous and current items differ according to Avro's comparator, and 
pass that down to Avro's reduce method.  To keep a copy of the previous item we 
will need to implement a #copy(Object, Schema) method for GenericData and 
SpecificData.


> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>
>                 Key: AVRO-513
>                 URL: https://issues.apache.org/jira/browse/AVRO-513
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>
> The Java mapreduce API added in AVRO-493 requires reducers implementations to 
> explicitly detect sequences of matching data.
> Rather the reduce method might better look something like:
>    void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to