Hi Vivek,
Can you include a unit test for this fix?
On Jun 28, 2007, at 2:40 AM, Vivek Ratan (JIRA) wrote:
[ https://issues.apache.org/jira/browse/HADOOP-1535?
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vivek Ratan updated HADOOP-1535:
--------------------------------
Attachment: 1535_01.patch
We use the comparator returned by JobConf.getOutputKeyComparator()
for the sort/merge phases of Map and Reduce. We use the comparator
returned by JobConf.getOutputValueGroupingComparator() for the
iterator across values for a given key. See 1535_01.patch.
Wrong comparator used to merge files in Reduce phase
----------------------------------------------------
Key: HADOOP-1535
URL: https://issues.apache.org/jira/browse/
HADOOP-1535
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3, 0.13.0
Reporter: Vivek Ratan
Assignee: Vivek Ratan
Fix For: 0.14.0
Attachments: 1535_01.patch
As per the fix for HADOOP-485, we allow users to optionally
provide a different comparator to group values when calling the
user's Reduce function. Devaraj and I were looking at the code
yesterday and we found that in ReduceTask.java, we use the user-
supplied comparator to merge the output files from the Map tasks
(we use the user-supplied comparator when creating a new
SequenceFile.Sorter object). This is incorrect as the comparator
used to merge Map output files should be the same as that used to
create those files in the Map phase. The user-supplied comparator
for grouping values should be used only in the iterator passed to
the user's Reduce function (which is done correctly in the code).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.