[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211818#comment-16211818 ]
Jason Lowe commented on MAPREDUCE-5124: --------------------------------------- As you point out, it's probably going to be bad for the IPC server thread to block since that will likely cause all server threads to block shortly thereafter preventing any RPC processing until the blockage clears. That is definitely a throttle, but it could be severe enough to cause task failures if they can't contact the AM in a timely manner. For the throttling case we could do something similar to what was done in YARN to help mitigate NM heartbeats overwhelming the RM. We could still dispatch status update events to the task attempts for every heartbeat, but instead of attaching the status update directly to the event we could attach the status payload to the TaskAttempt directly. If there's already an unprocessed status event pending on the TaskAttempt we can then coalesce the two status updates into a single status update. Coalescing should be pretty straightforward since the newer status should clobber the older status for most of the payload. Then when the status update event arrives at the TaskAttempt it processes the status update object, if there is one, then clears it. This should largely mitigate the problem since the memory pressure from all these events is primarily from the status payload attached to each event. If the server thread makes sure there is only at most one outstanding status payload per task then we have an upper limit on the number of outstanding status payloads the AM has to track. With this approach I don't think it will be necessary to scan or otherwise track each status update events posted to the dispatcher. They're going to be very small once the status payload is removed, and they'll be quick to process if its corresponding status payload was coalesced into an earlier payload. If desired we could also combine this idea with the client-side throttle hint in the RPC response, since the server thread will know whether it coalesced a status update or not. If it did then we could tell the client to throttle a bit for the next update. Deferred RPC response could be useful here, but I haven't thought through how tricky it would be to implement in practice. I agree that switching the AM to be the one that drives heartbeats is not appropriate here. My feeling is it creates as many problems as it solves. My current recommendation is to try the coalesce status updates approach as was done for NM heartbeats to the RM. That was pretty effective there at mitigating the backlog issues, and I think it could work well here too. As a bonus it makes it trivial to determine when we should tell a client to backoff a bit if we choose to do that as well. > AM lacks flow control for task events > ------------------------------------- > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.3-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Haibo Chen > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org