[ https://issues.apache.org/jira/browse/HBASE-20777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523004#comment-16523004 ]
Duo Zhang commented on HBASE-20777: ----------------------------------- TestAsyncTableBatch is fine now. Let me pushed to all branches which have netty rpc server. But there is another problem https://builds.apache.org/job/HBASE-Flaky-Tests/33682/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleAsyncWAL-output.txt/*view*/ {noformat} 2018-06-25 16:36:04,306 DEBUG [master/asf911:0.Chore.1] client.ResultBoundedCompletionService(226): Replica 0 returns java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1 java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1 at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:158) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.ConnectException: Call to asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: asf911.gq1.ygridcore.net/67.195.81.155:55296 at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:267) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:261) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122) at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:659) at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:678) at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:552) at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394) at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) ... 1 more Caused by: org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: syscall:getsockopt(..) failed: Connection refused: asf911.gq1.ygridcore.net/67.195.81.155:55296 at org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown Source) Caused by: org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException: syscall:getsockopt(..) failed: Connection refused ... 1 more {noformat} The exception is connection refused now but we are still timed out... Seems the meta region never online. Should be another bug, will open new issue to track it. > RpcConnection could still remain opened after we shutdown the NettyRpcServer > ---------------------------------------------------------------------------- > > Key: HBASE-20777 > URL: https://issues.apache.org/jira/browse/HBASE-20777 > Project: HBase > Issue Type: Bug > Reporter: Duo Zhang > Assignee: Duo Zhang > Priority: Major > Attachments: HBASE-20777-v1.patch, HBASE-20777.patch, > org.apache.hadoop.hbase.client.TestAsyncTableBatch-output.txt > > > The log is very strange, we keep sending request to a dead RS, and the result > is not connection refused, but rpc timeout, and later it becomes > CallQueueTooBig... -- This message was sent by Atlassian JIRA (v7.6.3#76005)