[
https://issues.apache.org/jira/browse/HBASE-20777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523004#comment-16523004
]
Duo Zhang commented on HBASE-20777:
-----------------------------------
TestAsyncTableBatch is fine now. Let me pushed to all branches which have netty
rpc server.
But there is another problem
https://builds.apache.org/job/HBASE-Flaky-Tests/33682/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleAsyncWAL-output.txt/*view*/
{noformat}
2018-06-25 16:36:04,306 DEBUG [master/asf911:0.Chore.1]
client.ResultBoundedCompletionService(226): Replica 0 returns
java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740,
hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1
java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at
region=hbase:meta,,1.1588230740,
hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1
at
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:158)
at
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call to
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
asf911.gq1.ygridcore.net/67.195.81.155:55296
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at
org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377)
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
at
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929)
at
org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179)
at
org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71)
at
org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:267)
at
org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:261)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:659)
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:678)
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:552)
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)
at
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by:
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connection refused:
asf911.gq1.ygridcore.net/67.195.81.155:55296
at
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
Source)
Caused by:
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
syscall:getsockopt(..) failed: Connection refused
... 1 more
{noformat}
The exception is connection refused now but we are still timed out... Seems the
meta region never online.
Should be another bug, will open new issue to track it.
> RpcConnection could still remain opened after we shutdown the NettyRpcServer
> ----------------------------------------------------------------------------
>
> Key: HBASE-20777
> URL: https://issues.apache.org/jira/browse/HBASE-20777
> Project: HBase
> Issue Type: Bug
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Attachments: HBASE-20777-v1.patch, HBASE-20777.patch,
> org.apache.hadoop.hbase.client.TestAsyncTableBatch-output.txt
>
>
> The log is very strange, we keep sending request to a dead RS, and the result
> is not connection refused, but rpc timeout, and later it becomes
> CallQueueTooBig...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)