[ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648903#action_12648903
 ] 

Chris Douglas commented on HADOOP-2774:
---------------------------------------

bq. It is difficult to have a testcase for this issue since the number of 
spilled records depend on a number of factors like the value of io.sort.mb, 
io.sort.factor, etc. Also, it is not guaranteed that the number will not change 
when we make changes in the merge/sort algorithm in the future.

This can- and should- have a unit test. While there are several, configurable 
parameters that determine the number of spills, it can be calculated and 
verified. If there are changes to the framework that invalidate the unit test, 
it can be updated or removed when that happens.

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to