This looks like a silly bug to me, we should definitely fix the logging - thanks for logging this Chris!
On Sep 21, 2012, at 12:23 PM, Chris Riccomini wrote: > Hey all, > > Is anyone else seeing this issue. It's unclear to me if I'm doing > something wrong, or if something is broken. > > Thanks! > Chris > > On 9/21/12 11:05 AM, "Chris Riccomini (JIRA)" <j...@apache.org> wrote: > >> Chris Riccomini created MAPREDUCE-4672: >> ------------------------------------------ >> >> Summary: RM with lost NMs results in massive log of >> AppAttemptId doesnt exist in cache >> Key: MAPREDUCE-4672 >> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4672 >> Project: Hadoop Map/Reduce >> Issue Type: Bug >> Components: resourcemanager >> Affects Versions: 0.23.1 >> Reporter: Chris Riccomini >> >> >> Hey Guys, >> >> I'm running a 9 node cluster with 8 NMs and a single RM node. If I run an >> app master and have that app master start a container, then shut down all >> nodes (to simulate a complete failure), the containers timeout and fail, >> as expected. >> >> What's unexpected is that my log then starts filling with: >> >> >> 2012-09-21 18:02:02,614 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:03,617 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:04,618 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:05,620 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:06,621 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:07,623 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> 2012-09-21 18:02:08,624 ERROR resourcemanager.ApplicationMasterService >> (ApplicationMasterService.java:allocate(247)) - AppAttemptId doesnt exist >> in cache appattempt_1348248013002_0001_000001 >> >> Is there any way to shut this off/fix it? It just keeps going forever, >> until I bounce the RM node. >> >> Thanks! >> Chris >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators >> For more information on JIRA, see: http://www.atlassian.com/software/jira > -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/