[
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153610#comment-16153610
]
Jason Lowe commented on MAPREDUCE-5124:
---------------------------------------
Turning on the RPC backoff feature alone will not be enough, as the call queues
aren't backing up today. We'd have to change the processing of the heartbeat
to be synchronously processed by the IPC server handler thread rather than
thrown on the AsyncDispatcher event queue as it's done today. That means we'll
quickly start tying up server handler threads for large jobs, and that will end
up choking out more important method calls like task assignment, task
completion, etc. It would probably work but be far from ideal when things
start to become congested.
> AM lacks flow control for task events
> -------------------------------------
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events
> from tasks. If the AM is unable to keep pace with the rate of incoming
> events for a sufficient period of time then it will eventually exhaust the
> heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event
> processing, but the AM could still get behind if it's starved for CPU and/or
> handling a very large job with tens of thousands of active tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]