runzhiwang opened a new pull request #134:
URL: https://github.com/apache/incubator-ratis/pull/134


   ## What changes were proposed in this pull request?
   
   What's the problem ?
   This happens in test, but it maybe also happen in production.
   
   For example, leader is s3 and follower is s4.
   1. kill s4, and restart s4.
   
   ```
   2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
(MiniRaftCluster.java:killServer(458)) - killServer s4
   2020-06-19 07:03:18,095 [Thread-6194] INFO  ratis.MiniRaftCluster 
(MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, 
group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false
   ```
   
   
   2. s4 start and set configuration from storage at 
[setRaftConf(raftConf.getLogEntryIndex(), raftConf) 
](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170)
 and s4 will change to RUNNING at 
[lifeCycle.transition(RUNNING)](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213)
   
   
   
   ```
   2020-06-19 07:03:18,127 [pool-16-thread-1] INFO  impl.RaftServerImpl 
(ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 
0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, 
s2:0.0.0.0:40473], old=null at 0
   2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
(RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a follower, 
conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, 
s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null
   2020-06-19 07:03:18,153 [Thread-6194] INFO  impl.RaftServerImpl 
(RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from   
   null to FOLLOWER at term 1 for startAsFollower
   ```
   
   
   
   3. s3 send append entry request to s4, and s4 change to RUNNING at 
[lifeCycle.compareAndTransition(STARTING, 
RUNNING)](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003)
   
   `2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl 
(RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive 
appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, s1:c0, 
s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, client-9414EC4E73DA, 
cid=3000`
   
   4. If change to RUNNING in step3 happens before step2, then step2 will throw 
exception.
   
   ```
   2020-06-19 07:03:18,169 [Thread-6194] INFO  impl.RoleInfo 
(RoleInfo.java:updateAndGet(143)) - s4: start FollowerState
   2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty 
(ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to 
kill/restart server: s4
   2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL 
TRANSITION: In s4, RUNNING -> RUNNING
   2020-06-19T07:03:18.1918899Z         at 
org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63)
   2020-06-19T07:03:18.1919240Z         at 
org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115)
   2020-06-19T07:03:18.1919558Z         at 
org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155)
   2020-06-19T07:03:18.1919878Z         at 
org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214)
   2020-06-19T07:03:18.1920206Z         at 
org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186)
   2020-06-19T07:03:18.1920520Z         at 
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
   2020-06-19T07:03:18.1920839Z         at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
   2020-06-19T07:03:18.1921330Z         at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
   2020-06-19T07:03:18.1921639Z         at 
java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
   2020-06-19T07:03:18.1921951Z         at 
java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
   2020-06-19T07:03:18.1922261Z         at 
java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
   2020-06-19T07:03:18.1922575Z         at 
java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401)
   2020-06-19T07:03:18.1922885Z         at 
java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734)
   2020-06-19T07:03:18.1925464Z         at 
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
   2020-06-19T07:03:18.1940816Z         at 
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
   2020-06-19T07:03:18.1953283Z         at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
   2020-06-19T07:03:18.1967610Z         at 
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
   2020-06-19T07:03:18.1980549Z         at 
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
   2020-06-19T07:03:18.1991620Z         at 
org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301)
   2020-06-19T07:03:18.1991958Z         at 
org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312)
   2020-06-19T07:03:18.1992275Z         at 
org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304)
   2020-06-19T07:03:18.1992609Z         at 
org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100)
   2020-06-19T07:03:18.1992920Z         at java.lang.Thread.run(Thread.java:748)
   ```
   
   
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-982
   
   ## How was this patch tested?
   
   Existed tests.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to