[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154072#comment-16154072
 ] 

Miklos Szegedi commented on MAPREDUCE-5124:
-------------------------------------------

Thank you, [~jlowe] for the previous reply. Let me address your concerns there. 
You are right, doing an asynchronous call leveraging HADOOP-11552 is probably 
the smallest change possible in this case.
What I was trying to solve is the theoretical problem sending heartbeat with 
metrics from large amount of tasks with graceful degradation with interval T 
and minimal delay D. The delay for a metric is {{D+T/2}}, when read from the 
AM. It waited D amount of time in the queue and once available it will be 
sampled with a mean delay of {{T/2}}. If the server controls the heartbeat both 
graceful degradation and minimal delay are met, since there is no delay D=0, 
the heartbeat is processed right away. If the task controls the heartbeat the 
average wait time adds to the delay of the current metrics, so any consumer 
will get those later. Indeed this would also mean making the client socket 
connection act as an RPC server, which is quite a big change.
I think either the server needs to control the heartbeat to minimize the delay 
(indeed a too big a change), or the task needs to tweak the heartbeat interval 
based on the previous response time as [~pbacsko] has suggested. The second 
option could be implemented on top of HADOOP-11552.

> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Haibo Chen
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to