runzhiwang opened a new pull request #134: URL: https://github.com/apache/incubator-ratis/pull/134
## What changes were proposed in this pull request? What's the problem ? This happens in test, but it maybe also happen in production. For example, leader is s3 and follower is s4. 1. kill s4, and restart s4. ``` 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:killServer(458)) - killServer s4 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false ``` 2. s4 start and set configuration from storage at [setRaftConf(raftConf.getLogEntryIndex(), raftConf) ](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170) and s4 will change to RUNNING at [lifeCycle.transition(RUNNING)](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213) ``` 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO impl.RaftServerImpl (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from null to FOLLOWER at term 1 for startAsFollower ``` 3. s3 send append entry request to s4, and s4 change to RUNNING at [lifeCycle.compareAndTransition(STARTING, RUNNING)](https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003) `2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, client-9414EC4E73DA, cid=3000` 4. If change to RUNNING in step3 happens before step2, then step2 will throw exception. ``` 2020-06-19 07:03:18,169 [Thread-6194] INFO impl.RoleInfo (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState 2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to kill/restart server: s4 2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL TRANSITION: In s4, RUNNING -> RUNNING 2020-06-19T07:03:18.1918899Z at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63) 2020-06-19T07:03:18.1919240Z at org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115) 2020-06-19T07:03:18.1919558Z at org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155) 2020-06-19T07:03:18.1919878Z at org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214) 2020-06-19T07:03:18.1920206Z at org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186) 2020-06-19T07:03:18.1920520Z at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) 2020-06-19T07:03:18.1920839Z at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) 2020-06-19T07:03:18.1921330Z at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) 2020-06-19T07:03:18.1921639Z at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) 2020-06-19T07:03:18.1921951Z at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) 2020-06-19T07:03:18.1922261Z at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) 2020-06-19T07:03:18.1922575Z at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) 2020-06-19T07:03:18.1922885Z at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) 2020-06-19T07:03:18.1925464Z at java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) 2020-06-19T07:03:18.1940816Z at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) 2020-06-19T07:03:18.1953283Z at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) 2020-06-19T07:03:18.1967610Z at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) 2020-06-19T07:03:18.1980549Z at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650) 2020-06-19T07:03:18.1991620Z at org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301) 2020-06-19T07:03:18.1991958Z at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312) 2020-06-19T07:03:18.1992275Z at org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304) 2020-06-19T07:03:18.1992609Z at org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100) 2020-06-19T07:03:18.1992920Z at java.lang.Thread.run(Thread.java:748) ``` ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/RATIS-982 ## How was this patch tested? Existed tests. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
