[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150458#comment-16150458 ]
Jason Lowe commented on MAPREDUCE-5124: --------------------------------------- Having the AM send the heartbeat means the AM needs to be the client in the RPC connection since only servers receive method calls. That creates two problems in practice. First is the discovery problem -- how does the AM know the listening port for each task? Second is thread scaling since the client RPC layer creates a thread for every connection. That means a thread per task which is not going to work for large jobs. bq. The actual code may use asynchronous calls not to create a thread for each task. This is really the key and the only thing necessary to solve the problem. The root cause of this problem is that the AM is quickly sending a response to each heartbeat without actually processing it. That creates a flow control issue since the rate of processing heartbeats is somewhat disconnected from the incoming rate. Therefore we can receive them at a rate far greater than it takes to process, causing an unbounded pileup of backlogged events. The reason the AM behaves this way is that it needs to free up the IPC Server handler thread so it can handle other tasks requests, like other heartbeats, new task attempt connections, task requests, etc. There's lots of other places in YARN and MAPREDUCE where a similar tactic is taken with the resulting flow control issue as a result. The real fix is to not send a heartbeat reply until the heartbeat is completely processed. Then there will only ever be as many outstanding heartbeats and metrics status updates as there are task attempts running at the time, rather than an unbounded number based on the rate difference between how fast the tasks are posting heartbeats and how fast the AsyncDispatcher can process them. If we were able to synchronously process the heartbeat in a way that doesn't completely tie up an IPC Server handler thread for the duration of the heartbeat call then we're all set. Task heartbeats naturally slow down as the ability of the AM to process them degrades. No need for the AM to be explicit about rejecting requests or the AM itself doing any 3 second sleeping. We just need to leverage the functionality added in HADOOP-11552 so we aren't compelled to reply to heartbeats before they are fully processed to free up an IPC Server thread. > AM lacks flow control for task events > ------------------------------------- > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.3-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org