[ 
https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878688#action_12878688
 ] 

Tom White commented on AVRO-513:
--------------------------------

Generally looks good.

* Does the change to AvroKeySerialization mean that a copy is made per reduce 
value? This could be improved to be a copy per reduce group, although it's more 
work. 
* If a subclass wants to override close() then they have to call super.close() 
or the reducer may not complete. This could be the source of very subtle bugs, 
so it might be better to mark close() as final, and have another close method 
for subclasses, closeReducer() perhaps? However, this gets in the way of users 
who may want to use the non-grouping reduce() method. I don't see a good answer 
here.
* If I call next() without calling hasNext(), I don't get a 
NoSuchElementException, which doesn't conform to the Iterator contract. The 
next() method should check to see if there is a next and throw 
NoSuchElementException if not.
* Rather than polling the queue, you could use the blocking take() method and 
interrupt the thread from close() to signal that there are no more values.
* Starting a thread from within a subclass constructor is unsafe (because the 
object may not be fully constructed, see 
http://www.ibm.com/developerworks/java/library/j-jtp0618.html#4). Better to 
call runner.start() just after constructing the runner.

> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>
>                 Key: AVRO-513
>                 URL: https://issues.apache.org/jira/browse/AVRO-513
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.4.0
>
>         Attachments: AVRO-513.patch
>
>
> The Java mapreduce API added in AVRO-493 requires reducers implementations to 
> explicitly detect sequences of matching data.
> Rather the reduce method might better look something like:
>    void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to