[ https://issues.apache.org/jira/browse/MAPREDUCE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444145#comment-13444145 ]
Thomas Graves commented on MAPREDUCE-4499: ------------------------------------------ +1 looks good. findbugs exist on branch-1 without this change. [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 8 new Findbugs (version 1.3.9) warnings. [exec] > Looking for speculative tasks is very expensive in 1.x > ------------------------------------------------------ > > Key: MAPREDUCE-4499 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4499 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, performance > Affects Versions: 1.0.3 > Reporter: Nathan Roberts > Assignee: Koji Noguchi > Attachments: mapreduce-4499-v1.0.2-1.patch > > > When there are lots of jobs and tasks active in a cluster, the process of > figuring out whether or not to launch a speculative task becomes very > expensive. > I could be missing something but it certainly looks like on every heartbeat > we could be scanning 10's of thousands of tasks looking for something which > might need to be speculatively executed. In most cases, nothing gets chosen > so we completely trashed our data cache and didn't even find a task to > schedule, just to do it all over again on the next heartbeat. > On busy jobtrackers, the following backtrace is very common: > "IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800 > nid=0xb50 runnable [0x0000000045adb000] > java.lang.Thread.State: RUNNABLE > at java.util.TreeMap.valEquals(TreeMap.java:1182) > at java.util.TreeMap.containsValue(TreeMap.java:227) > at java.util.TreeMap$Values.contains(TreeMap.java:940) > at > org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072) > at > org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193) > - locked <0x00002aaefde82338> (a > org.apache.hadoop.mapred.JobInProgress) > at > org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417) > - locked <0x00002aaefde82338> (a > org.apache.hadoop.mapred.JobInProgress) > at > org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432) > - locked <0x00002aaefde82338> (a > org.apache.hadoop.mapred.JobInProgress) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419) > at > org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150) > at > org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075) > at > org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044) > - locked <0x00002aab6e27a4c8> (a > org.apache.hadoop.mapred.CapacityTaskScheduler) > at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398) > - locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker) > ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira