[
https://issues.apache.org/jira/browse/HADOOP-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648925#action_12648925
]
Devaraj Das commented on HADOOP-2774:
-------------------------------------
(Sorry that the bug escaped my eye during the commit)
Here are my thoughts:
1) In the map task, use a static volatile counter in the IFile.Writer that
maintains the total count of records spilled to disk so far.
2) In the reduce task, use a static volatile counter in the IFile.Reader that
maintains the total count of records read from disk so far.
In the above no information is exchanged between the map and reduce tasks, and
the ifile format is not touched too.
I agree that keeping the information in IFile is in line with keeping the
information self-contained, but there is tradeoff in the implementation
complexity and the usefulness of that approach versus the one i propose here..
Thoughts?
> Add counters to show number of key/values that have been sorted and merged in
> the maps and reduces
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-2774
> URL: https://issues.apache.org/jira/browse/HADOOP-2774
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Owen O'Malley
> Assignee: Ravi Gummadi
> Fix For: 0.20.0
>
> Attachments: HADOOP-2774.patch, HADOOP-2774.patch
>
>
> For each *pass* of the sort and merge, I would like a count of the number of
> records. So for example, if the map output 100 records and they were sorted
> once, the counter would be 100. If it spilled twice and was merged together,
> it would be 200. Clearly in a multi-level merge, it may not be a multiple of
> the number of map output records. This would let the users easily see if they
> have values like io.sort.mb or io.sort.factor set too low.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.