[ 
https://issues.apache.org/jira/browse/RATIS-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948584#comment-16948584
 ] 

Lokesh Jain commented on RATIS-705:
-----------------------------------

# The issue is fixed by calling TimeoutScheduler.close after 
channel.awaitTermination is called. This makes sure that the thread is not 
interrupted before the channel is shutdown.
 # There can be a NPE in TimeoutScheduler#onTaskCompleted - This can happen if 
this function is called after the scheduler is closed.
 # I have added a unit test to reproduce the issue. Both of these issues should 
not lead to failure of client writes. The client is able to retry even without 
the fixes.

> GrpcClientProtocolClient#close throws InterruptedException
> ----------------------------------------------------------
>
>                 Key: RATIS-705
>                 URL: https://issues.apache.org/jira/browse/RATIS-705
>             Project: Ratis
>          Issue Type: Bug
>          Components: gRPC
>            Reporter: Nilotpal Nandi
>            Assignee: Lokesh Jain
>            Priority: Major
>         Attachments: RATIS-705.001.patch
>
>
> GrpcClientProtocolClient#close throws InterruptedException. This happens when 
> GrpcClientProtocolClient#close is called from a TimeoutScheduler thread. 
> GrpcClientProtocolClient#close calls scheduler.close() which interrupts all 
> the timeout scheduler threads including the thread executing the close 
> routine. This leads to InterruptedException when channel.awaitTermination is 
> called.
>  
> {code:java}
> 19/10/09 07:40:33 ERROR client.GrpcClientProtocolClient: Unexpected exception 
> while waiting for channel termination
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
>         at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
>         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelImpl.awaitTermination(ManagedChannelImpl.java:763)
>         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ForwardingManagedChannel.awaitTermination(ForwardingManagedChannel.java:57)
>         at 
> org.apache.ratis.thirdparty.io.grpc.internal.ManagedChannelOrphanWrapper.awaitTermination(ManagedChannelOrphanWrapper.java:70)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient.close(GrpcClientProtocolClient.java:146)
>         at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.lambda$close$1(PeerProxyMap.java:74)
>         at 
> org.apache.ratis.util.LifeCycle.lambda$checkStateAndClose$2(LifeCycle.java:231)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:251)
>         at 
> org.apache.ratis.util.LifeCycle.checkStateAndClose(LifeCycle.java:229)
>         at 
> org.apache.ratis.util.PeerProxyMap$PeerAndProxy.close(PeerProxyMap.java:70)
>         at 
> org.apache.ratis.util.PeerProxyMap.resetProxy(PeerProxyMap.java:127)
>         at 
> org.apache.ratis.util.PeerProxyMap.handleException(PeerProxyMap.java:136)
>         at 
> org.apache.ratis.client.impl.RaftClientRpcWithProxy.handleException(RaftClientRpcWithProxy.java:47)
>         at 
> org.apache.ratis.client.impl.RaftClientImpl.handleIOException(RaftClientImpl.java:372)
>         at 
> org.apache.ratis.client.impl.OrderedAsync.lambda$sendRequest$10(OrderedAsync.java:236)
>         at 
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>         at 
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>         at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>         at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:324)
>         at java.util.Optional.ifPresent(Optional.java:159)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:329)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:324)
>         at 
> org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:318)
>         at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
>         at 
> org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
>         at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50)
>         at org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to