[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211818#comment-16211818
 ] 

Jason Lowe commented on MAPREDUCE-5124:
---------------------------------------

As you point out, it's probably going to be bad for the IPC server thread to 
block since that will likely cause all server threads to block shortly 
thereafter preventing any RPC processing until the blockage clears.  That is 
definitely a throttle, but it could be severe enough to cause task failures if 
they can't contact the AM in a timely manner.

For the throttling case we could do something similar to what was done in YARN 
to help mitigate NM heartbeats overwhelming the RM.  We could still dispatch 
status update events to the task attempts for every heartbeat, but instead of 
attaching the status update directly to the event we could attach the status 
payload to the TaskAttempt directly.  If there's already an unprocessed status 
event pending on the TaskAttempt we can then coalesce the two status updates 
into a single status update.  Coalescing should be pretty straightforward since 
the newer status should clobber the older status for most of the payload.  Then 
when the status update event arrives at the TaskAttempt it processes the status 
update object, if there is one, then clears it.  This should largely mitigate 
the problem since the memory pressure from all these events is primarily from 
the status payload attached to each event.  If the server thread makes sure 
there is only at most one outstanding status payload per task then we have an 
upper limit on the number of outstanding status payloads the AM has to track.

With this approach I don't think it will be necessary to scan or otherwise 
track each status update events posted to the dispatcher.  They're going to be 
very small once the status payload is removed, and they'll be quick to process 
if its corresponding status payload was coalesced into an earlier payload.  If 
desired we could also combine this idea with the client-side throttle hint in 
the RPC response, since the server thread will know whether it coalesced a 
status update or not.  If it did then we could tell the client to throttle a 
bit for the next update.

Deferred RPC response could be useful here, but I haven't thought through how 
tricky it would be to implement in practice.

I agree that switching the AM to be the one that drives heartbeats is not 
appropriate here.  My feeling is it creates as many problems as it solves.

My current recommendation is to try the coalesce status updates approach as was 
done for NM heartbeats to the RM.  That was pretty effective there at 
mitigating the backlog issues, and I think it could work well here too.  As a 
bonus it makes it trivial to determine when we should tell a client to backoff 
a bit if we choose to do that as well.


> AM lacks flow control for task events
> -------------------------------------
>
>                 Key: MAPREDUCE-5124
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.3-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Haibo Chen
>         Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to