[ https://issues.apache.org/jira/browse/MAPREDUCE-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194211#comment-13194211 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3711: ---------------------------------------------------- I doubt if that is the case. Because, according to what [~karams] says, the problem doesn't happen when the AM is killed during the map phase when ~90% of maps are done. Unless history events for reduces getting started log unusually large records. I think this has got to do some bug in the recovery code recovering reduces, but then I'll let you debug. Thanks for taking this up! > AppMaster recovery for Medium to large jobs take long time > ---------------------------------------------------------- > > Key: MAPREDUCE-3711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.0 > Reporter: Siddharth Seth > Assignee: Robert Joseph Evans > Priority: Blocker > > Reported by [~karams] > yarn.resourcemanager.am.max-retries=2 > Ran test cases with sort job on 350 scale having 16800 maps and 680 reduces -: > 1. After 70 secs of Job Sumbission Am is killed using kill -9, around 3900 > maps were completed and 680 reduces were > scheduled, Second AM got restart. Job got completed in 980 secs. AM took very > less time to recover. > 2. After 150 secs of Job Sumbission AM is killed using kill -9, around 90% > maps were completed and 680 reduces were > scheduled , Second AM got restart Job got completed in 1000 secs. AM got > revocer. > 3. After 150 secs of Job Sumbission AM as killed using kill -9, almost all > maps were completed and only 680 reduces > were running, Recovery was too slow, AM was still revocering after 1hr :40 > mis when I killed the run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira