Lokesh Jain created RATIS-690:
---------------------------------

             Summary: RaftClient should reconnect and change leader on 
TimeoutIOException, AlreadyClosedException
                 Key: RATIS-690
                 URL: https://issues.apache.org/jira/browse/RATIS-690
             Project: Ratis
          Issue Type: Bug
          Components: client
            Reporter: Lokesh Jain
            Assignee: Lokesh Jain


RaftClient does not change leaderId and does not refresh the streamObserver for 
the old leaderId in case of TimeoutIOException. For AlreadyClosedException also 
the raft client should retry to a different leader.

For TimeoutIOException leader change is good to have if the server has false 
notion of being the leader. Similarly for AlreadyClosedException I guess it 
will be good to try sending request to a different server since the existing 
stream might be closed by server.
{code:java}
19/09/25 18:09:42 DEBUG client.RaftClient: client-E5E9D444D128: suggested new 
leader: null. Failed 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 
with {}19/09/25 18:09:42 DEBUG client.RaftClient: client-E5E9D444D128: 
suggested new leader: null. Failed 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 
with {}org.apache.ratis.protocol.TimeoutIOException: Request timeout 3000ms: 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$timeoutCheck$3(GrpcClientProtocolClient.java:319)
 at java.util.Optional.ifPresent(Optional.java:159) at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.handleReplyFuture(GrpcClientProtocolClient.java:323)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.timeoutCheck(GrpcClientProtocolClient.java:318)
 at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers.lambda$onNext$1(GrpcClientProtocolClient.java:312)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$0(TimeoutScheduler.java:113)
 at 
org.apache.ratis.util.TimeoutScheduler.lambda$onTimeout$1(TimeoutScheduler.java:133)
 at org.apache.ratis.util.LogUtils.runAndLog(LogUtils.java:50) at 
org.apache.ratis.util.LogUtils$1.run(LogUtils.java:91) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)19/09/25 18:09:42 DEBUG 
client.RaftClient: client-E5E9D444D128: 
oldLeader=46001d60-9ca0-4042-a521-69cb38b96882,  
curLeader=46001d60-9ca0-4042-a521-69cb38b96882, 
newLeader=394810ca-48fe-47bd-a0c8-eeaaec380bc319/09/25 18:09:42 DEBUG 
impl.OrderedAsync: schedule* attempt #171 with policy 
RetryLimited(maxAttempts=180, sleepTime=1000ms) for 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c419/09/25
 18:09:43 DEBUG impl.OrderedAsync: client-E5E9D444D128: send* 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c419/09/25
 18:09:43 DEBUG client.RaftClient: client-E5E9D444D128: suggested new leader: 
null. Failed 
RaftClientRequest:client-E5E9D444D128->46001d60-9ca0-4042-a521-69cb38b96882@group-0B8E59E48142,
 cid=34, seq=11*, RW, 
org.apache.hadoop.hdds.scm.XceiverClientRatis$$Lambda$72/2083154356@108d55c4 
with {}java.io.IOException: java.lang.IllegalStateException: Entry already 
exists for key 11 in map 
SlidingWindow$Server:257-OrderedRequestStreamObserver257:requests at 
org.apache.ratis.util.IOUtils.asIOException(IOUtils.java:54) at 
org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:106) at 
org.apache.ratis.grpc.GrpcUtil.unwrapException(GrpcUtil.java:75) at 
org.apache.ratis.grpc.GrpcUtil.unwrapIOException(GrpcUtil.java:128) at 
org.apache.ratis.grpc.client.GrpcClientProtocolClient$AsyncStreamObservers$1.onError(GrpcClientProtocolClient.java:283)
 at 
org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
 at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
 at 
org.apache.ratis.thirdparty.io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
 at 
org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
 at 
org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.IllegalStateException: Entry already exists for key 11 in map 
SlidingWindow$Server:257-OrderedRequestStreamObserver257:requests at 
sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown Source) at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
org.apache.ratis.util.ReflectionUtils.instantiateException(ReflectionUtils.java:222)
 at org.apache.ratis.grpc.GrpcUtil.tryUnwrapException(GrpcUtil.java:104) ... 22 
more
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to