[ https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855204#action_12855204 ]
Doug Cutting commented on AVRO-513: ----------------------------------- To implement this, we can specify a Hadoop grouping comparator that always returns true, so Hadoop's reduce method is called with a sorted iterator over all items in the partition. Then we'd can wrap this in an iterator that keeps a copy of the previous item and whose #hasNext() implementation returns false when the previous and current items differ according to Avro's comparator, and pass that down to Avro's reduce method. To keep a copy of the previous item we will need to implement a #copy(Object, Schema) method for GenericData and SpecificData. > java mapreduce api should pass iterator of matching objects to reduce > --------------------------------------------------------------------- > > Key: AVRO-513 > URL: https://issues.apache.org/jira/browse/AVRO-513 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Doug Cutting > Assignee: Doug Cutting > > The Java mapreduce API added in AVRO-493 requires reducers implementations to > explicitly detect sequences of matching data. > Rather the reduce method might better look something like: > void reduce(Iterator<IN>, Collector<OUT>); > Where all equal values are passed in a single call. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.