[ https://issues.apache.org/jira/browse/RATIS-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shashikant Banerjee resolved RATIS-982. --------------------------------------- Fix Version/s: 0.6.0 Resolution: Fixed > Fix RaftServerImpl illegal transition from RUNNING to RUNNING > ------------------------------------------------------------- > > Key: RATIS-982 > URL: https://issues.apache.org/jira/browse/RATIS-982 > Project: Ratis > Issue Type: Bug > Reporter: runzhiwang > Assignee: runzhiwang > Priority: Major > Fix For: 0.6.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This happens in test, but it maybe also happen in production. > For example, leader is s3 and follower is s4. > 1. kill s4, and restart s4. > {code:java} > 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster > (MiniRaftCluster.java:killServer(458)) - killServer s4 > 2020-06-19 07:03:18,095 [Thread-6194] INFO ratis.MiniRaftCluster > (MiniRaftCluster.java:newRaftServer(330)) - newRaftServer: s4, > group-5BD7E8A01610:[s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, > s1:0.0.0.0:33783, s2:0.0.0.0:40473], format? false > {code} > 2. s4 start and set configuration from storage at > [setRaftConf(raftConf.getLogEntryIndex(), raftConf) > |https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L170] > and s4 will change to RUNNING at > [lifeCycle.transition(RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L213] > {code:java} > 2020-06-19 07:03:18,127 [pool-16-thread-1] INFO impl.RaftServerImpl > (ServerState.java:setRaftConf(356)) - s4@group-5BD7E8A01610: set > configuration 0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, > s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null at 0 > 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl > (RaftServerImpl.java:start(185)) - s4@group-5BD7E8A01610: start as a > follower, conf=0: [s3:0.0.0.0:43375, s4:0.0.0.0:33719, s0:0.0.0.0:34867, > s1:0.0.0.0:33783, s2:0.0.0.0:40473], old=null > 2020-06-19 07:03:18,153 [Thread-6194] INFO impl.RaftServerImpl > (RaftServerImpl.java:setRole(174)) - s4@group-5BD7E8A01610: changes role from > null to FOLLOWER at term 1 for startAsFollower > {code} > 3. s3 send append entry request to s4, and s4 change to RUNNING at > [lifeCycle.compareAndTransition(STARTING, > RUNNING)|https://github.com/apache/incubator-ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/RaftServerImpl.java#L1003] > {code:java} > 2020-06-19 07:03:18,162 [nioEventLoopGroup-59-1] DEBUG impl.RaftServerImpl > (RaftServerImpl.java:logAppendEntries(918)) - s4@group-5BD7E8A01610: receive > appendEntries(s3, 1, (t:1, i:0), 0, false, commits[s3:c0, s4:c0, s0:c0, > s1:c0, s2:c0], entries: (t:1, i:1), STATEMACHINELOGENTRY, > client-9414EC4E73DA, cid=3000 > {code} > 4. If change to RUNNING in step3 happens before step2, then step2 will throw > exception. > {code:java} > 2020-06-19 07:03:18,169 [Thread-6194] INFO impl.RoleInfo > (RoleInfo.java:updateAndGet(143)) - s4: start FollowerState > 2020-06-19 07:03:18,174 [Thread-6194] ERROR netty.TestRaftWithNetty > (ExitUtils.java:terminate(133)) - Terminating with exit status -1: Failed to > kill/restart server: s4 > 2020-06-19T07:03:18.1918474Z java.lang.IllegalStateException: ILLEGAL > TRANSITION: In s4, RUNNING -> RUNNING > 2020-06-19T07:03:18.1918899Z at > org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:63) > 2020-06-19T07:03:18.1919240Z at > org.apache.ratis.util.LifeCycle$State.validate(LifeCycle.java:115) > 2020-06-19T07:03:18.1919558Z at > org.apache.ratis.util.LifeCycle.transition(LifeCycle.java:155) > 2020-06-19T07:03:18.1919878Z at > org.apache.ratis.server.impl.RaftServerImpl.startAsFollower(RaftServerImpl.java:214) > 2020-06-19T07:03:18.1920206Z at > org.apache.ratis.server.impl.RaftServerImpl.start(RaftServerImpl.java:186) > 2020-06-19T07:03:18.1920520Z at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > 2020-06-19T07:03:18.1920839Z at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) > 2020-06-19T07:03:18.1921330Z at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) > 2020-06-19T07:03:18.1921639Z at > java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290) > 2020-06-19T07:03:18.1921951Z at > java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) > 2020-06-19T07:03:18.1922261Z at > java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > 2020-06-19T07:03:18.1922575Z at > java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) > 2020-06-19T07:03:18.1922885Z at > java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) > 2020-06-19T07:03:18.1925464Z at > java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159) > 2020-06-19T07:03:18.1940816Z at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173) > 2020-06-19T07:03:18.1953283Z at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) > 2020-06-19T07:03:18.1967610Z at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485) > 2020-06-19T07:03:18.1980549Z at > java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650) > 2020-06-19T07:03:18.1991620Z at > org.apache.ratis.server.impl.RaftServerProxy.start(RaftServerProxy.java:301) > 2020-06-19T07:03:18.1991958Z at > org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:312) > 2020-06-19T07:03:18.1992275Z at > org.apache.ratis.MiniRaftCluster.restartServer(MiniRaftCluster.java:304) > 2020-06-19T07:03:18.1992609Z at > org.apache.ratis.RaftBasicTests.lambda$killAndRestartServer$2(RaftBasicTests.java:100) > 2020-06-19T07:03:18.1992920Z at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)