[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Chris Douglas (JIRA) Mon, 24 Nov 2008 18:37:09 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650448#action_12650448
 ]


Chris Douglas commented on HADOOP-2774:
---------------------------------------

bq. Wanted to test the path of combiner getting called in Map & Reduce phases — 
so have combiner. Words are repeated(fixed number of times) in the input files.
Ah; I'd missed that. OK.

bq. OK, Setting mapred.child.java.opts explicitly now. Made 
mapred.job.shuffle.buffer.percent=0.
Sorry, I meant: if .001 was chosen to get around a bug, i.e. that 
mapred.job.shuffle.\* couldn't be zero, then the ratio is insufficient to get 
the behavior the test required. If the property can be zero, then it's not 
necessary (no need to set mapred.child.java.opts). Wait... this test uses 
LocalJobRunner. The mapred.job.shuffle\* parameter should do absolutely 
nothing; the segments will always be read from disk. Unless this test uses the 
Mini\*Clusters, there's no reason to configure the memory management at the 
reduce.

On the map side, for a fixed number of spills, io.sort.record.percent and 
io.sort.spill.percent should probably be controlled so the number of spills is 
deterministic. Since you've already sized the test for the defaults, setting 
them in the test is probably sufficient.

Other points:
* The test case should clean up when it fails, as well. Can't it just delete 
{{TEST_ROOT_DIR}}?
* IFile.{Reader,Writer} doesn't need another constructor. IFile is a 
package-private class, so its interface can be changed in a 
backwards-incompatible way.
* Quick nit:
{noformat}
+      if(readRecordsCounter != null)
+        readRecordsCounter.increment(numRecordsRead);
{noformat}
This doesn't conform to the [coding 
standards|http://java.sun.com/docs/codeconv/html/CodeConventions.doc6.html#449].
* Sorry to revive this, but I really don't see the value in tracking the number 
of first level spills. Since the changes supporting it are not related to the 
rest of the patch, I suggest it be pushed to a different JIRA.

Other than these minor points, the patch looks good.

> Add counters to show number of key/values that have been sorted and merged in 
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2774
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2774
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.0
>
>         Attachments: HADOOP-2774.patch, HADOOP-2774.patch, HADOOP-2774.patch, 
> HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of 
> records. So for example, if the map output 100 records and they were sorted 
> once, the counter would be 100. If it spilled twice and was merged together, 
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of 
> the number of map output records. This would let the users easily see if they 
> have values like io.sort.mb or io.sort.factor set too low.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2774) Add counters to show number of key/values that have been sorted and merged in the maps and reduces

Reply via email to