[
https://issues.apache.org/jira/browse/AVRO-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878767#action_12878767
]
Tom White commented on AVRO-513:
--------------------------------
> I suppose once a value's been consumed from the queue it could be returned to
> a pool used by the deserializer. We could limit the size of the pool to be
> the same size as the queue. Is that what you had in mind?
I was thinking that you only need to copy at the beginning of the group, since
you can compare subsequent values to the copy, until they differ, at which
point you make a new copy.
> I worry a bit that something else could interrupt the thread or intercept the
> InterruptedException, e.g., in the user's reducer. Is that a well-founded
> worry?
The interrupt could be used to signal to check a shared variable indicating
that the reduce is done, rather than the runner interpreting an interrupt as a
done signal. This would take care of the problem of something accidentally
interrupting.
Not sure about an interrupt signal being intercepted, or not being delivered.
But I think it's possible that the interrupt occurs between the check on "done"
and the call to take(), so the call to take() would go ahead and cause a
deadlock.
> A better approach might be to put in a sentinel value. Unfortunately this
> has to be of type T, and we don't know how to construct a T.
Could we use AvroWrapper here? Perhaps a null wrapper means end of reduce.
> java mapreduce api should pass iterator of matching objects to reduce
> ---------------------------------------------------------------------
>
> Key: AVRO-513
> URL: https://issues.apache.org/jira/browse/AVRO-513
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.0
>
> Attachments: AVRO-513.patch, AVRO-513.patch
>
>
> The Java mapreduce API added in AVRO-493 requires reducers implementations to
> explicitly detect sequences of matching data.
> Rather the reduce method might better look something like:
> void reduce(Iterator<IN>, Collector<OUT>);
> Where all equal values are passed in a single call.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.