allow a different comparator for grouping keys in calls to reduce
-----------------------------------------------------------------
Key: HADOOP-485
URL: http://issues.apache.org/jira/browse/HADOOP-485
Project: Hadoop
Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Owen O'Malley
Assigned To: Owen O'Malley
Some algorithms require that the values to the reduce be sorted in a particular
order, but extending the key with the additional fields causes them to be
handled by different calls to reduce. (The user then collects the values until
they detect a "real" key change and then processes them.)
It would be much easier if the framework let you define a second comparator
that did the grouping of values for reduces. So your reduce inputs look like:
A1, V1
A2, V2
A3, V3
B1, V4
B2, V5
instead of getting calls to reduce that look like:
reduce(A1, {V1}); reduce(A2, {V2}); reduce(A3, {V3}); reduce(B1, {V4});
reduce(B2, {V5});
you could define the grouping comparator to just compare the letters and end up
with:
reduce(A1, {V1,V2,V3}); reduce(B1, {V4,V5});
which is the desired outcome. Note that this assumes that the "extra" part of
the key is just for sorting because the reduce will only see the first
representative of each equivalence class.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira