Xiao Chen created HADOOP-14727:
----------------------------------

             Summary: Socket not closed properly when reading Configurations 
with BlockReaderRemote
                 Key: HADOOP-14727
                 URL: https://issues.apache.org/jira/browse/HADOOP-14727
             Project: Hadoop Common
          Issue Type: Bug
          Components: conf
    Affects Versions: 3.0.0-alpha4, 2.9.0
            Reporter: Xiao Chen
            Priority: Blocker


This is caught by Cloudera's internal testing over the alpha3 release.

We got report that some hosts ran out of FDs. Triaging that, found out both 
oozie server and Yarn JobHistoryServer have tons of sockets on {{CLOSE_WAIT}} 
state.

[~haibochen] helped then narrow down a consistent reproduction by simply 
visiting the JHS webui, and clicking through a job and its logs.

I then look at the {{BlockReaderRemote}} and related code. After adding a debug 
log whenever a {{Peer}} is created/closed/in/out {{PeerCache}}, it looks like 
all the {{CLOSE_WAIT}} sockets are created from this call stack:
{noformat}
2017-08-02 13:58:59,901 INFO 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory: ____ associated peer 
NioInetPeer(Socket[addr=/10.17.196.28,port=20002,localport=42512]) with 
blockreader org.apache.hadoop.hdfs.client.impl.BlockReaderRemote@717ce109
java.lang.Exception: test
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:745)
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:385)
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:636)
        at 
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:566)
        at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:807)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        at 
com.ctc.wstx.io.StreamBootstrapper.ensureLoaded(StreamBootstrapper.java:482)
        at 
com.ctc.wstx.io.StreamBootstrapper.resolveStreamEncoding(StreamBootstrapper.java:306)
        at 
com.ctc.wstx.io.StreamBootstrapper.bootstrapInput(StreamBootstrapper.java:167)
        at 
com.ctc.wstx.stax.WstxInputFactory.doCreateSR(WstxInputFactory.java:573)
        at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:633)
        at 
com.ctc.wstx.stax.WstxInputFactory.createSR(WstxInputFactory.java:647)
        at 
com.ctc.wstx.stax.WstxInputFactory.createXMLStreamReader(WstxInputFactory.java:366)
        at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2649)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2697)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2662)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2545)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:1076)
        at 
org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1126)
        at org.apache.hadoop.conf.Configuration.getInt(Configuration.java:1344)
        at org.apache.hadoop.mapreduce.counters.Limits.init(Limits.java:45)
        at org.apache.hadoop.mapreduce.counters.Limits.reset(Limits.java:130)
        at 
org.apache.hadoop.mapreduce.v2.hs.CompletedJob.loadFullHistoryData(CompletedJob.java:363)
        at 
org.apache.hadoop.mapreduce.v2.hs.CompletedJob.<init>(CompletedJob.java:105)
        at 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo.loadJob(HistoryFileManager.java:473)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.loadJob(CachedHistoryStorage.java:180)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.access$000(CachedHistoryStorage.java:52)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:103)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage$1.load(CachedHistoryStorage.java:100)
        at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
        at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
        at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
        at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
        at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
        at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
        at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
        at 
com.google.common.cache.LocalCache$LocalManualCache.getUnchecked(LocalCache.java:4834)
        at 
org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getFullJob(CachedHistoryStorage.java:193)
        at 
org.apache.hadoop.mapreduce.v2.hs.JobHistory.getJob(JobHistory.java:220)
        at 
org.apache.hadoop.mapreduce.v2.app.webapp.AppController.requireJob(AppController.java:416)
        at 
org.apache.hadoop.mapreduce.v2.app.webapp.AppController.attempts(AppController.java:277)
        at 
org.apache.hadoop.mapreduce.v2.hs.webapp.HsController.attempts(HsController.java:152)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:162)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
        at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
        at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
        at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
        at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:941)
        at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
        at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
        at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
        at 
com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
        at 
org.apache.hadoop.security.http.XFrameOptionsFilter.doFilter(XFrameOptionsFilter.java:57)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
        at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
        at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1552)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
        at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
        at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:534)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
        at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
        at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
        at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
        at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
        at java.lang.Thread.run(Thread.java:748)
{noformat}

I was able to further confirm this theory by backing out the 4 recent commits 
to {{Configuration}} on alpha3 and no longer seeing {{CLOSE_WAIT}} sockets.
- HADOOP-14501. 
- HADOOP-14399. 
- HADOOP-14216. Addendum 
- HADOOP-14216. 

It's not clear to me who's responsible to close the InputStream though.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to