[ https://issues.apache.org/jira/browse/HADOOP-13436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398374#comment-15398374 ]
Xiaobing Zhou commented on HADOOP-13436: ---------------------------------------- By pulling out all subclasses of RetryPolicy, only RetryForever and TryOnceThenFail don't have any member fields, which means it's not always possible to follow the pattern that multiple instances with equivalent fields' values are viewed as equal (i.e. being part of ConnectionId#equals) to avoid creating new connections. This makes exceptions for option #1, which is a dilemma since we can't go to option #2 and #3. Any thoughts? Thanks. > RPC connections are leaking due to missing equals override in > RetryUtils#getDefaultRetryPolicy > ---------------------------------------------------------------------------------------------- > > Key: HADOOP-13436 > URL: https://issues.apache.org/jira/browse/HADOOP-13436 > Project: Hadoop Common > Issue Type: Bug > Components: ipc > Affects Versions: 2.7.1 > Reporter: Xiaobing Zhou > Assignee: Xiaobing Zhou > Attachments: repro.sh > > > We've noticed RPC connections are increasing dramatically in a Kerberized > HDFS cluster with {noformat}dfs.client.retry.policy.enabled{noformat} > enabled. Internally, Client#getConnection is doing lookup relying on > ConnectionId#equals, which composes checking RetryPolicy#equals. If > subclasses of RetryPolicy neglect overriding RetryPolicy#equals, every > instances of RetryPolicy with equivalent fields' values (e.g. > MultipleLinearRandomRetry[6x10000ms, 10x60000ms]) will lead to a brand new > connection because the check will fall back to Object#equals. > This is stack trace where RetryUtils#getDefaultRetryPolicy: > {noformat} > at > org.apache.hadoop.io.retry.RetryUtils.getDefaultRetryPolicy(RetryUtils.java:82) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:409) > at > org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:315) > at > org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:609) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.newDfsClient(WebHdfsHandler.java:272) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.onOpen(WebHdfsHandler.java:215) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.handle(WebHdfsHandler.java:135) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:117) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler$1.run(WebHdfsHandler.java:114) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.hdfs.server.datanode.web.webhdfs.WebHdfsHandler.channelRead0(WebHdfsHandler.java:114) > at > org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:52) > at > org.apache.hadoop.hdfs.server.datanode.web.URLDispatcher.channelRead0(URLDispatcher.java:32) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Three options to fix the problem: > 1. All subclasses of RetryPolicy must override equals and hashCode to deliver > less discriminating equivalence relation, i.e. they are equal if they have > meaningful equivalent fields' values (e.g. > MultipleLinearRandomRetry[6x10000ms, 10x60000ms]) > 2. Change ConnectionId#equals by removing RetryPolicy#equals compoment. > 3. Let WebHDFS reuse the DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org