[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399496#comment-13399496
 ] 

Ravi Prakash commented on MAPREDUCE-4300:
-----------------------------------------

bq. For your first comment I think we could try and do fault injection. I have 
a patch now that is installing the an UncaughtExceptionHandler, and I am 
testing it simply by setting the heap size small and seeing what happens.
Sweet! This is great. Thanks! :)

bq. For the second one I don't think the point is to try and prevent an OOM 
form happening. Pig tries to do this with a swapping type thing and the code is 
very brittle, and they still will get an occasional OOM. OOMs and other errors 
are going to happen. I think the point is to make sure that we don't get into a 
deadlock/zombie like state when they do.
Hmm.... I agree that NOT getting stuck in a zombie state is absolutely 
imperative. However, if we fail the daemon, isn't Hadoop just going to retry? 
Which will basically mean I'll be retrying a (possibly) big job 3-4 times 
before finally failing it. I see a need for a "health check thread" in almost 
all daemons. And memory headroom, disk health, connection health, all of these 
can be tied in. Admittedly, such a framework is probably out of scope for this 
JIRA. But just throwing it out in case we want to design towards that goal.
                
> OOM in AM can turn it into a zombie.
> ------------------------------------
>
>                 Key: MAPREDUCE-4300
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: StackDump.txt
>
>
> It looks like 4 threads in the AM died with OOM but not the one pinging the 
> RM.
> stderr for this AM
> {noformat}
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
> org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
> May 30, 2012 4:49:55 AM 
> com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
>  get
> WARNING: You are attempting to use a deprecated API (specifically, attempting 
> to @Inject ServletContext inside an eagerly created singleton. While we allow 
> this for backwards compatibility, be warned that this MAY have unexpected 
> behavior if you have more than one injector (with ServletModule) running in 
> the same JVM. Please consult the Guice documentation at 
> http://code.google.com/p/google-guice/wiki/Servlets for more information.
> May 30, 2012 4:49:55 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering 
> org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider 
> class
> May 30, 2012 4:49:55 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a 
> provider class
> May 30, 2012 4:49:55 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as 
> a root resource class
> May 30, 2012 4:49:55 AM 
> com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
> May 30, 2012 4:49:55 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver 
> to GuiceManagedComponentProvider with the scope "Singleton"
> May 30, 2012 4:49:56 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to 
> GuiceManagedComponentProvider with the scope "Singleton"
> May 30, 2012 4:49:56 AM 
> com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory 
> getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to 
> GuiceManagedComponentProvider with the scope "PerRequest"
> Exception in thread "ResponseProcessor for block 
> BP-1114822160-<IP>-1322528669066:blk_-6528896407411719649_34227308" 
> java.lang.OutOfMemoryError: Java heap space
>       at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538)
>       at 
> com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
>       at 
> com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201)
>       at 
> com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738)
>       at 
> org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287)
>       at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
>       at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> Exception in thread "DefaultSpeculator background processing" 
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.HashMap.resize(HashMap.java:462)
>       at java.util.HashMap.addEntry(HashMap.java:755)
>       at java.util.HashMap.put(HashMap.java:385)
>       at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632)
>       at 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465)
>       at 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433)
>       at 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509)
>       at 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56)
>       at 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176)
>       at java.lang.Thread.run(Thread.java:619)
> Exception in thread "Timer for 'MRAppMaster' metrics system" 
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "Socket Reader #4 for port 50500" 
> java.lang.OutOfMemoryError: Java heap space
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to