[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated MAPREDUCE-5124: ------------------------------------ Attachment: MAPREDUCE-5124-CoalescingPOC-1.patch I created a POC which uses this "event coalescing approach". I roughly describe what changed: * Added new method {{setNextUpdate()}} to {{TaskAttemptImpl}} * Added the mapping of TaskAttemptID <-> TaskAttemptImpl * At each {{statusUpdate()}}, we call {{setNextUpdate()}} and don't pass the status object as a payload * In the {{StatusUpdater}} transition, we check if we need to update the status or not. If needsUpdate=true, then we run the original updater logic. If we have backlog of task update events for a given attempt and that attempt hasn't been updated, the {{StatusUpdater}} will not do anything because {{needsUpdate}} will be false. I also kept the original updating logic, that is, retrieving it from the event. First I tried to remove the original constructor of {{TaskAttemptStatusUpdateEvent}} but it caused compilation errors in various classes. It turned out that quite a few test cases use the old approach to manipulate the status of a task attempt. I didn't want to introduce too many code changes. Not sure what's the best solution in this case. [~jlowe] could you take a look at this POC? > AM lacks flow control for task events > ------------------------------------- > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.3-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Peter Bacsko > Attachments: MAPREDUCE-5124-CoalescingPOC-1.patch, > MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org