[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150458#comment-16150458
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---------------------------------------

Having the AM send the heartbeat means the AM needs to be the client in the RPC 
connection since only servers receive method calls.  That creates two problems 
in practice.  First is the discovery problem -- how does the AM know the 
listening port for each task?  Second is thread scaling since the client RPC 
layer creates a thread for every connection.  That means a thread per task 
which is not going to work for large jobs.

bq. The actual code may use asynchronous calls not to create a thread for each 
task.

This is really the key and the only thing necessary to solve the problem. The 
root cause of this problem is that the AM is quickly sending a response to each 
heartbeat without actually processing it.  That creates a flow control issue 
since the rate of processing heartbeats is somewhat disconnected from the 
incoming rate.  Therefore we can receive them at a rate far greater than it 
takes to process, causing an unbounded pileup of backlogged events.  The reason 
the AM behaves this way is that it needs to free up the IPC Server handler 
thread so it can handle other tasks requests, like other heartbeats, new task 
attempt connections, task requests, etc.  There's lots of other places in YARN 
and MAPREDUCE where a similar tactic is taken with the resulting flow control 
issue as a result.

The real fix is to not send a heartbeat reply until the heartbeat is completely 
processed.  Then there will only ever be as many outstanding heartbeats and 
metrics status updates as there are task attempts running at the time, rather 
than an unbounded number based on the rate difference between how fast the 
tasks are posting heartbeats and how fast the AsyncDispatcher can process them. 
 If we were able to synchronously process the heartbeat in a way that doesn't 
completely tie up an IPC Server handler thread for the duration of the 
heartbeat call then we're all set.  Task heartbeats naturally slow down as the 
ability of the AM to process them degrades.  No need for the AM to be explicit 
about rejecting requests or the AM itself doing any 3 second sleeping.  We just 
need to leverage the functionality added in HADOOP-11552 so we aren't compelled 
to reply to heartbeats before they are fully processed to free up an IPC Server 
thread.


> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Haibo Chen
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to