[ https://issues.apache.org/jira/browse/MAPREDUCE-6867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962368#comment-15962368 ]
Naganarasimha G R commented on MAPREDUCE-6867: ---------------------------------------------- Thanks for working on the patch [~bilwa] Few comments from my side : # ExitUtil, ln no 154, {{LOG.fatal("Halt called", ee)}} possibility of one more OOM exception, IMO have try finally block and in the finally block {{if(!systemHaltDisabled) System.exit(status);}} would be a better option. # MRJobConfig, ln no 547 : this kind of overrides the {{systemHaltDisabled}}, but its only enabled for tests in general. But not sure what side effects it can have. Secondly i think option of {{XX:OnOutOfMemoryError=\"kill -9}} is dependent on the OS. so not sure its the right approach and not aware any option is there to set based on OS type. given that we have fix in {{ExitUtil}} is it required to have this modification too ? > ApplicationMaster hung on OOM Error > ----------------------------------- > > Key: MAPREDUCE-6867 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6867 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Reporter: Bilwa S T > Assignee: Bilwa S T > Attachments: MAPREDUCE-6867.patch > > > Whenever OOM Error is thown, {{YarnUncaughtExceptionHandler}} will call > {{ExitUtil.halt(-1)}}.But while halting, OOM might occur which is not > handled. > We came across a scenario where in when we submit mapreduce application ,OOM > error occured in {{committerEventProcessor}} and then AM did not halt and did > not log the following.Finally AM got hang since it's not thrown to main > thread. > {code}LOG.info("Halt with status " + status + " Message: " + msg);{code} > *org.apache.hadoop.util.ExitUtil.halt(int, String)* > {code} > public static void halt(int status, String msg) throws HaltException { > LOG.info("Halt with status " + status + " Message: " + msg); > if (systemHaltDisabled) { > HaltException ee = new HaltException(status, msg); > LOG.fatal("Halt called", ee); > if (null == firstHaltException) { > firstHaltException = ee; > } > throw ee; > } > Runtime.getRuntime().halt(status); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org