[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429577#comment-13429577 ]
Ahmed Radwan commented on MAPREDUCE-4469: ----------------------------------------- I meant when the task is done. So currently these counters' updates are done from two locations: continuously from the communication thread run(), and then only once at the end from the done() method. So we can optionally disable the continuous calls from the communication thread which constitutes the main overhead. getPhysicalMemorySize(), getVirtualMemorySize() and getCumulativeCpuTime() in ResourceCalculatorPlugin obtains the cumulative value for the current process tree, so when this is called from the done, it should still account for any child processes. Is that correct? > Resource calculation in child tasks is CPU-heavy > ------------------------------------------------ > > Key: MAPREDUCE-4469 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: performance, task > Affects Versions: 1.0.3 > Reporter: Todd Lipcon > Assignee: Ahmed Radwan > Attachments: MAPREDUCE-4469.patch > > > In doing some benchmarking on a hadoop-1 derived codebase, I noticed that > each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed > that it's spending a lot of time looping through all the files in /proc to > calculate resource usage. > As a test, I added a flag to disable use of the ResourceCalculatorPlugin > within the tasks. On a CPU-bound 500G-sort workload, this improved total job > runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira