[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-4469:
------------------------------------

    Attachment: MAPREDUCE-4469_rev5.patch

Here is the updated patch. Thanks Todd and Phil for your comments! I have 
updated the patch to get rid of the excludedPids caching which may result in 
miscalculations due to pid recycling as Todd highlighted. The patch also uses 
StringUtils.split(). Since getrusage only accounts for terminated children, the 
updates will be missing important resource usage info for any currently running 
children which haven't terminated yet, so in my opinion we shouldn't use it. I 
have also added a time frequency property (in milliseconds instead of skips) 
determining when the resource usage counters are updated. Please take a look 
and let me know if you have any comments.
                
> Resource calculation in child tasks is CPU-heavy
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: performance, task
>    Affects Versions: 1.0.3
>            Reporter: Todd Lipcon
>            Assignee: Ahmed Radwan
>         Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
> MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, 
> MAPREDUCE-4469_rev5.patch
>
>
> In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
> each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
> that it's spending a lot of time looping through all the files in /proc to 
> calculate resource usage.
> As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
> within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
> runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to