[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971431#comment-13971431
 ] 

Jason Lowe commented on MAPREDUCE-4937:
---------------------------------------

Yeah, it's tricky to dispatch messages while the MR AM is still initializing 
services.  Lots of nasty races there which necessitates delaying the dispatch 
of failure events until the services have finished starting.  It'd be nice if 
we could just start the services and then dispatch Job events, but some of the 
services need some of the results of the JOB_INIT handling hence we're handling 
events while still initializing services.

Patch looks good overall, just a minor comment and nit:
- Instead of duplicating the super.serviceStart() and having a mid-method 
return, we could do something like this:
{code}
  // initially assume we will initialize successfully
  boolean initFailed = false;
  if (!errorHappenedShutdownNow) {
    ...
    // init failed if the job didn't leave the NEW state
    initFailed = (((JobImpl)job).getInternalState() == JobStateInternal.NEW);
  }
  //start all the components
  super.serviceStart();

  // set job classloader if configured
  MRApps.setJobClassLoader(getConfig());

  // All components have started
  if (initFailed) {
    JobEvent initFailedEvent = new JobEvent(job.getID(), 
JobEventType.JOB_INIT_FAILED);
    jobEventDispatcher.handle(initFailedEvent);
  } else {
    startJobs();
  }
{code}
- Nit: JOB_INIT_FAILED should be listed by the other events produced by 
MRAppMaster (i.e.: JOB_INIT, JOB_START) so like the other events it's 
documented where the event originates
- Nit: would be nice to have some better indentation on the wrapped lines in 
the test case, since those lines have smaller indents than the method 
declaration

> MR AM handles an oversized split metainfo file poorly
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-4937
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4937
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.0.2-alpha, 0.23.5
>            Reporter: Jason Lowe
>            Assignee: Eric Payne
>         Attachments: MAPREDUCE-4937.MRAMHandlOversizeSplits.txt, 
> MAPREDUCE-4937.MRAMHandlOversizeSplits.txt
>
>
> When an job runs with a split metainfo file that's larger than it has been 
> configured to handle then it just crashes.  This leaves the user with a 
> less-than-ideal debug session since there are no useful diagnostic messages 
> sent to the client for this failure.  In addition it crashes before 
> registering/unregistering with the RM and crashes without generating history, 
> so the proxy URL is not very useful and there's no archived configuration to 
> check to see what setting the AM was using when it encountered the error.
> The AM should handle this error case more gracefully and treat the failure as 
> it does any other failed job, with a proper unregistration from the RM and 
> with history.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to