[ https://issues.apache.org/jira/browse/MAPREDUCE-4728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582571#comment-13582571 ]
Thomas Graves commented on MAPREDUCE-4728: ------------------------------------------ this might be a dup of MAPREDUCE-4478 > Interaction between oob heartbeats and damper can cause TT to heartbeat with > zero delay > --------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4728 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 1.0.3 > Reporter: Nathan Roberts > Attachments: MAPREDUCE-4728.patch > > > When mapreduce.tasktracker.outofband.heartbeat is true and > mapreduce.tasktracker.outofband.heartbeat.damper is something largish (like > the default of 1000000), the TT doesn't wait for tasks to finish before > heartbeating back to the JT. This causes excessive load on the JT which > in-turn reduces overall cluster performance. > I believe the problem is that in the following block of code, when > getHeartbeatInterval() returns 0, we heartbeat back immediately BUT > finishedCount does not get reset. It looks like nothing ever gets us out of > this situation so we basically heartbeat without ever sleeping. > > {code} > // accelerate to account for multiple finished tasks up-front > long remaining = > (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; > while (remaining > 0) { > // sleeps for the wait time or > // until there are *enough* empty slots to schedule tasks > synchronized (finishedCount) { > finishedCount.wait(remaining); > // Recompute > now = System.currentTimeMillis(); > remaining = > (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - > now; > if (remaining <= 0) { > // Reset count > finishedCount.set(0); > break; > } > } > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira