[
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211818#comment-16211818
]
Jason Lowe commented on MAPREDUCE-5124:
---------------------------------------
As you point out, it's probably going to be bad for the IPC server thread to
block since that will likely cause all server threads to block shortly
thereafter preventing any RPC processing until the blockage clears. That is
definitely a throttle, but it could be severe enough to cause task failures if
they can't contact the AM in a timely manner.
For the throttling case we could do something similar to what was done in YARN
to help mitigate NM heartbeats overwhelming the RM. We could still dispatch
status update events to the task attempts for every heartbeat, but instead of
attaching the status update directly to the event we could attach the status
payload to the TaskAttempt directly. If there's already an unprocessed status
event pending on the TaskAttempt we can then coalesce the two status updates
into a single status update. Coalescing should be pretty straightforward since
the newer status should clobber the older status for most of the payload. Then
when the status update event arrives at the TaskAttempt it processes the status
update object, if there is one, then clears it. This should largely mitigate
the problem since the memory pressure from all these events is primarily from
the status payload attached to each event. If the server thread makes sure
there is only at most one outstanding status payload per task then we have an
upper limit on the number of outstanding status payloads the AM has to track.
With this approach I don't think it will be necessary to scan or otherwise
track each status update events posted to the dispatcher. They're going to be
very small once the status payload is removed, and they'll be quick to process
if its corresponding status payload was coalesced into an earlier payload. If
desired we could also combine this idea with the client-side throttle hint in
the RPC response, since the server thread will know whether it coalesced a
status update or not. If it did then we could tell the client to throttle a
bit for the next update.
Deferred RPC response could be useful here, but I haven't thought through how
tricky it would be to implement in practice.
I agree that switching the AM to be the one that drives heartbeats is not
appropriate here. My feeling is it creates as many problems as it solves.
My current recommendation is to try the coalesce status updates approach as was
done for NM heartbeats to the RM. That was pretty effective there at
mitigating the backlog issues, and I think it could work well here too. As a
bonus it makes it trivial to determine when we should tell a client to backoff
a bit if we choose to do that as well.
> AM lacks flow control for task events
> -------------------------------------
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mr-am
> Affects Versions: 2.0.3-alpha, 0.23.5
> Reporter: Jason Lowe
> Assignee: Haibo Chen
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events
> from tasks. If the AM is unable to keep pace with the rate of incoming
> events for a sufficient period of time then it will eventually exhaust the
> heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event
> processing, but the AM could still get behind if it's starved for CPU and/or
> handling a very large job with tens of thousands of active tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]