[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated MAPREDUCE-4499:
------------------------------------

    Description: 
When there are lots of jobs and tasks active in a cluster, the process of 
figuring out whether or not to launch a speculative task becomes very 
expensive. 

I could be missing something but it certainly looks like on every heartbeat we 
could be scanning 10's of thousands of tasks looking for something which might 
need to be speculatively executed. In most cases, nothing gets chosen so we 
completely trashed our data cache and didn't even find a task to schedule, just 
to do it all over again on the next heartbeat.

On busy jobtrackers, the following backtrace is very common:

"IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800
nid=0xb50 runnable [0x0000000045adb000]
  java.lang.Thread.State: RUNNABLE
       at java.util.TreeMap.valEquals(TreeMap.java:1182)
       at java.util.TreeMap.containsValue(TreeMap.java:227)
       at java.util.TreeMap$Values.contains(TreeMap.java:940)
       at
org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072)
       at
org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193)
       - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
       at
org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417)
       - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
       at
org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432)
       - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
       at
org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
       - locked <0x00002aab6e27a4c8> (a
org.apache.hadoop.mapred.CapacityTaskScheduler)
       at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
       - locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker)
...)

  was:  

    
> Looking for speculative tasks is very expensive in 1.x
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-4499
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4499
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1, performance
>    Affects Versions: 1.0.3
>            Reporter: Nathan Roberts
>            Assignee: Koji Noguchi
>             Fix For: 1.2.0
>
>         Attachments: mapreduce-4499-v1.0.2-1.patch
>
>
> When there are lots of jobs and tasks active in a cluster, the process of 
> figuring out whether or not to launch a speculative task becomes very 
> expensive. 
> I could be missing something but it certainly looks like on every heartbeat 
> we could be scanning 10's of thousands of tasks looking for something which 
> might need to be speculatively executed. In most cases, nothing gets chosen 
> so we completely trashed our data cache and didn't even find a task to 
> schedule, just to do it all over again on the next heartbeat.
> On busy jobtrackers, the following backtrace is very common:
> "IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800
> nid=0xb50 runnable [0x0000000045adb000]
>   java.lang.Thread.State: RUNNABLE
>        at java.util.TreeMap.valEquals(TreeMap.java:1182)
>        at java.util.TreeMap.containsValue(TreeMap.java:227)
>        at java.util.TreeMap$Values.contains(TreeMap.java:940)
>        at
> org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072)
>        at
> org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193)
>        - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>        at
> org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417)
>        - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>        at
> org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432)
>        - locked <0x00002aaefde82338> (a
> org.apache.hadoop.mapred.JobInProgress)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
>        at
> org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
>        - locked <0x00002aab6e27a4c8> (a
> org.apache.hadoop.mapred.CapacityTaskScheduler)
>        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
>        - locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker)
> ...)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to