[ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649625#action_12649625
 ] 

Chris Douglas commented on HADOOP-2774:
---------------------------------------

1) If there's only one spill in the map, there will be no merge. The counter 
needs to be given to IFile.Writer and never the merger on the map side.
2) The merger just needs to count the number of records it emits. That's all it 
can do; it doesn't know if it's feeding an IFile.Writer, the reduce, or the 
combiner. The critical piece is figuring out when it is appropriate to give the 
counter to an IFile.Writer or the merger.
3) Records read by a merge and records written by the merge are the same 
quantity. What is the distinction? That the preceding comment passes null for 
one of the two parameters in each of the preceding cases suggests that their 
mutual exclusivity is clear. That they are the same quantity is required by the 
semantics of the merge. A use case would be helpful

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to