[jira] [Commented] (MAPREDUCE-4499) Looking for speculative tasks is very expensive in 1.x

Nathan Roberts (JIRA) Mon, 20 Aug 2012 15:02:40 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438259#comment-13438259
 ]


Nathan Roberts commented on MAPREDUCE-4499:
-------------------------------------------

Ideally we'd be much less aggressive about calling findSpeculativeTask() (2 
million calls over a few seconds is a bit much).  However, this patch looks to 
be a safe way to mitigate the problem by avoiding the seemingly expensive call 
to hasRunOnMachine(). I double checked the truth table against the pseudo-code 
and the patch and they look good to me. 
                
> Looking for speculative tasks is very expensive in 1.x
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4499
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4499
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, performance
>    Affects Versions: 1.0.3
>            Reporter: Nathan Roberts
>         Attachments: mapreduce-4499-v1.0.2-1.patch
>
>
> When there are lots of jobs and tasks active in a cluster, the process of 
> figuring out whether or not to launch a speculative task becomes very 
> expensive. 
> I could be missing something but it certainly looks like on every heartbeat 
> we could be scanning 10's of thousands of tasks looking for something which 
> might need to be speculatively executed. In most cases, nothing gets chosen 
> so we completely trashed our data cache and didn't even find a task to 
> schedule, just to do it all over again on the next heartbeat.
> On busy jobtrackers, the following backtrace is very common:
> "IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800
> nid=0xb50 runnable [0x0000000045adb000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.TreeMap.valEquals(TreeMap.java:1182)
>         at java.util.TreeMap.containsValue(TreeMap.java:227)
>         at java.util.TreeMap$Values.contains(TreeMap.java:940)
>         at
> org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072)
>         at
> org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193)
>         - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>         at
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417)
>         - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>         at
> org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432)
>         - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
>         at
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
>         - locked <0x00002aab6e27a4c8> (a
> org.apache.hadoop.mapred.CapacityTaskScheduler)
>         at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
>         - locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker)
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4499) Looking for speculative tasks is very expensive in 1.x

Reply via email to