[ 
https://issues.apache.org/jira/browse/HBASE-20777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523004#comment-16523004
 ] 

Duo Zhang commented on HBASE-20777:
-----------------------------------

TestAsyncTableBatch is fine now. Let me pushed to all branches which have netty 
rpc server.

But there is another problem

https://builds.apache.org/job/HBASE-Flaky-Tests/33682/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.multiwal.TestReplicationKillMasterRSCompressedWithMultipleAsyncWAL-output.txt/*view*/

{noformat}
2018-06-25 16:36:04,306 DEBUG [master/asf911:0.Chore.1] 
client.ResultBoundedCompletionService(226): Replica 0 returns 
java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to 
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connection refused: 
asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, 
hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1
java.net.SocketTimeoutException: callTimeout=60000, callDuration=68578: Call to 
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connection refused: 
asf911.gq1.ygridcore.net/67.195.81.155:55296 row '' on table 'hbase:meta' at 
region=hbase:meta,,1.1588230740, 
hostname=asf911.gq1.ygridcore.net,55296,1529944208029, seqNum=-1
        at 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:158)
        at 
org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Call to 
asf911.gq1.ygridcore.net/67.195.81.155:55296 failed on connection exception: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connection refused: 
asf911.gq1.ygridcore.net/67.195.81.155:55296
        at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:165)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
        at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
        at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
        at 
org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
        at 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
        at 
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$500(NettyRpcConnection.java:71)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:267)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:261)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:659)
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:678)
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:552)
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:394)
        at 
org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:304)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connection refused: 
asf911.gq1.ygridcore.net/67.195.81.155:55296
        at 
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
 Source)
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
 syscall:getsockopt(..) failed: Connection refused
        ... 1 more
{noformat}

The exception is connection refused now but we are still timed out... Seems the 
meta region never online.

Should be another bug, will open new issue to track it.

> RpcConnection could still remain opened after we shutdown the NettyRpcServer
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-20777
>                 URL: https://issues.apache.org/jira/browse/HBASE-20777
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>         Attachments: HBASE-20777-v1.patch, HBASE-20777.patch, 
> org.apache.hadoop.hbase.client.TestAsyncTableBatch-output.txt
>
>
> The log is very strange, we keep sending request to a dead RS, and the result 
> is not connection refused, but rpc timeout, and later it becomes 
> CallQueueTooBig...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to