[ 
https://issues.apache.org/jira/browse/HADOOP-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Bowen updated HADOOP-1146:
--------------------------------

    Attachment: 1146.patch


This patch:

   1. Renames the counter Reduce Input Records to Reduce Input Groups since 
that what it counts.

   2. Adds a new counter called Reduce Input Records that does count the 
records.

   3. Then when testing on Wordcount, I noticed that Map Output Records and 
Reduce Input Records were not the same because of the use of a Combiner.  So I 
added two new counters to show this: Combine Input Records and Combine Output 
Records.

I'm not sure if we really need these Combine Input/Output record counters.  At 
the end of the job, they should be the same as Map Output Records and Reduce 
Input Records respectively, but they are possibly interesting to watch as the 
job proceeds.

Comments welcome.


> "Reduce input records" counter name is misleading
> -------------------------------------------------
>
>                 Key: HADOOP-1146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1146
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: David Bowen
>         Assigned To: David Bowen
>         Attachments: 1146.patch
>
>
> It has been pointed out that the counter name "reduce input records" is 
> misleading; this number should be called "reduce input keys" or "reduce input 
> groups".  It could also be useful to have the actual number of reduce input 
> records, which should be the same as the number of map output records.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to