[
https://issues.apache.org/jira/browse/HADOOP-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sameer Paranjpye updated HADOOP-485:
------------------------------------
Comment: was deleted
> allow a different comparator for grouping keys in calls to reduce
> -----------------------------------------------------------------
>
> Key: HADOOP-485
> URL: https://issues.apache.org/jira/browse/HADOOP-485
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Affects Versions: 0.5.0
> Reporter: Owen O'Malley
> Assigned To: Owen O'Malley
>
> Some algorithms require that the values to the reduce be sorted in a
> particular order, but extending the key with the additional fields causes
> them to be handled by different calls to reduce. (The user then collects the
> values until they detect a "real" key change and then processes them.)
> It would be much easier if the framework let you define a second comparator
> that did the grouping of values for reduces. So your reduce inputs look like:
> A1, V1
> A2, V2
> A3, V3
> B1, V4
> B2, V5
> instead of getting calls to reduce that look like:
> reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4});
> reduce(B2, {V5});
> you could define the grouping comparator to just compare the letters and end
> up with:
> reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});
> which is the desired outcome. Note that this assumes that the "extra" part of
> the key is just for sorting because the reduce will only see the first
> representative of each equivalence class.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.