[
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212452#comment-16212452
]
Peter Bacsko commented on MAPREDUCE-5124:
-----------------------------------------
Thanks Jason for the insights.
Could you give me the location of the code that handles this situation in the
Resource Manager? I'd like to take a look at those.
You mentioned that "coalescing" the events should do the job. Are you
suggesting to update {{TaskAttemptImpl.reportedStatus}} directly? This indeed
seems to be a reasonable thing to do.
But are you sure this would eliminate the problem completely? We'd still put
the same number of {{TaskAttemptStatusUpdateEvent}} to the dispatcher's queue,
so with a lot of tasks, it could still put a considerable stress on the AM.
> AM lacks flow control for task events
> -------------------------------------
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events
> from tasks. If the AM is unable to keep pace with the rate of incoming
> events for a sufficient period of time then it will eventually exhaust the
> heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event
> processing, but the AM could still get behind if it's starved for CPU and/or
> handling a very large job with tens of thousands of active tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]