[ https://issues.apache.org/jira/browse/MAPREDUCE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971431#comment-13971431 ]
Jason Lowe commented on MAPREDUCE-4937: --------------------------------------- Yeah, it's tricky to dispatch messages while the MR AM is still initializing services. Lots of nasty races there which necessitates delaying the dispatch of failure events until the services have finished starting. It'd be nice if we could just start the services and then dispatch Job events, but some of the services need some of the results of the JOB_INIT handling hence we're handling events while still initializing services. Patch looks good overall, just a minor comment and nit: - Instead of duplicating the super.serviceStart() and having a mid-method return, we could do something like this: {code} // initially assume we will initialize successfully boolean initFailed = false; if (!errorHappenedShutdownNow) { ... // init failed if the job didn't leave the NEW state initFailed = (((JobImpl)job).getInternalState() == JobStateInternal.NEW); } //start all the components super.serviceStart(); // set job classloader if configured MRApps.setJobClassLoader(getConfig()); // All components have started if (initFailed) { JobEvent initFailedEvent = new JobEvent(job.getID(), JobEventType.JOB_INIT_FAILED); jobEventDispatcher.handle(initFailedEvent); } else { startJobs(); } {code} - Nit: JOB_INIT_FAILED should be listed by the other events produced by MRAppMaster (i.e.: JOB_INIT, JOB_START) so like the other events it's documented where the event originates - Nit: would be nice to have some better indentation on the wrapped lines in the test case, since those lines have smaller indents than the method declaration > MR AM handles an oversized split metainfo file poorly > ----------------------------------------------------- > > Key: MAPREDUCE-4937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4937 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am > Affects Versions: 2.0.2-alpha, 0.23.5 > Reporter: Jason Lowe > Assignee: Eric Payne > Attachments: MAPREDUCE-4937.MRAMHandlOversizeSplits.txt, > MAPREDUCE-4937.MRAMHandlOversizeSplits.txt > > > When an job runs with a split metainfo file that's larger than it has been > configured to handle then it just crashes. This leaves the user with a > less-than-ideal debug session since there are no useful diagnostic messages > sent to the client for this failure. In addition it crashes before > registering/unregistering with the RM and crashes without generating history, > so the proxy URL is not very useful and there's no archived configuration to > check to see what setting the AM was using when it encountered the error. > The AM should handle this error case more gracefully and treat the failure as > it does any other failed job, with a proper unregistration from the RM and > with history. -- This message was sent by Atlassian JIRA (v6.2#6252)