Hi Tsz Wo, Thank you again for the prompt response. Kindly let me take a step back and explain what I am trying to solve. I want to ensure durability of the State Machine in case all the nodes go down.
If I am running a 3 node Ratis Cluster and if all the nodes go down due to some physical hardware failure, I need a way to ensure that when the new node spawns up it should be able to restore the state. To do so, I am thinking of taking periodic snapshots to a durable storage e.g. S3 and when a new node spawn (which should be handled by another service), it can pull the snapshot from S3 and restore the state. To simulate this scenario, I clean the storage directory of ratis nodes before starting it up so they don't have any previous state and let the nodes pull the snapshot from a separate directory. Please let me know if there is some other way I can solve this problem. Hope this helps. Regards, Snehasish On Thu, 5 Mar 2026 at 23:50, Tsz Wo Sze <[email protected]> wrote: > Hi Snehasish, > > > Once the snapshot is triggered, I move it to a different directory to > simulate clean restart. > > Is this step required to reproduce the failure? If there is a snapshot > taken, the server expects that the snapshot is there and it may delete the > raft logs for freeing up space. If this step is required to reproduce the > failure, it does not look like a bug. > > In general, we cannot manually move the Ratis metadata around. Just like > that if we manually move some system files around in Linux or Windows, the > system may not be able to restart. > > Tsz-Wo > > > > On Thu, Mar 5, 2026 at 10:08 AM Tsz Wo Sze <[email protected]> wrote: > > > Hi Snehasish, > > > > Since you already have a test, could you share the code change? You may > > attach a patch file or create a pull request. I will run it to > reproduce > > the failure. > > > > In the meantime, I will try to understand the details you provided. > > > > Tsz-Wo > > > > > > On Thu, Mar 5, 2026 at 3:14 AM Snehasish Roy <[email protected]> > > wrote: > > > >> Hi Tsz-Wo, > >> > >> Thank you for your prompt response. I was able to reproduce this issue > >> using CounterStateMachine. > >> > >> I added an utility in the CounterClient to trigger a snapshot. > >> > >> ``` > >> private void takeSnapshot() throws IOException { > >> RaftClientReply raftClientReply = client.getSnapshotManagementApi() > >> .create(true, 30_000); > >> System.out.println(raftClientReply); > >> } > >> ``` > >> > >> Once the snapshot is triggered, I move it to a different directory to > >> simulate clean restart. > >> > >> I also updated the SimpleStateMachineStorage::loadLatestSnapshot() to > look > >> for snapshots in a different directory. > >> > >> ``` > >> public SingleFileSnapshotInfo loadLatestSnapshot() { > >> final File dir = new File("/tmp/snapshots"); > >> } > >> ``` > >> > >> Full steps for reproduction > >> 1. I started a 3 Node CounterServer and performed some updates to the > >> state > >> machine using the CounterClient. > >> > >> 2. Triggered the snapshot via the CounterClient and then moved the > >> snapshot > >> to a different directory - the snapshot will be of the format > term_index. > >> Here the term will initially be 1, and let's assume the index is at 10. > >> > >> 3. Kill the leader, the term would have increased to 2. > >> > >> 4. Perform some updates and trigger another snapshot. Let's assume the > >> index is at 20 and the term is at 2. Moved the snapshot to a different > >> directory. > >> > >> 5. Stopped all nodes. Cleared all storage directories of all the nodes > to > >> simulate clean restart. > >> > >> 6. Start 3 node CounterServer and observe the failure at the startup. > >> > >> ``` > >> 026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest snapshot > >> is > >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in > >> /tmp/snapshots > >> 2026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest > snapshot > >> is SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in > >> /tmp/snapshots > >> 2026-03-05 15:48:56 INFO RaftServerConfigKeys:62 - > >> raft.server.log.use.memory = false (default) > >> 2026-03-05 15:48:56 INFO RaftServer$Division:155 - > n0@group-ABB3109A44C1 > >> : > >> getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1) returns > >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] > >> 2026-03-05 15:48:56 INFO RaftLog:90 - > >> n0@group-ABB3109A44C1-SegmentedRaftLog: snapshotIndexFromStateMachine = > >> 20 > >> .... > >> 2026-03-05 15:49:02 INFO RaftServer$Division:577 - > n1@group-ABB3109A44C1 > >> : > >> set firstElectionSinceStartup to false for becomeLeader > >> 2026-03-05 15:49:02 INFO RaftServer$Division:278 - > n1@group-ABB3109A44C1 > >> : > >> change Leader from null to n1 at term 1 for becomeLeader, leader elected > >> after 672ms > >> 2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:440 - > >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment from > >> index:21 > >> 2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:647 - > >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log segment > >> > /ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21 > >> .... > >> 2026-03-05 15:49:02 INFO RaftServer$Division:309 - Leader > >> n1@group-ABB3109A44C1-LeaderStateImpl is ready since appliedIndex == > >> startIndex == 21 > >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 - > >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable. > >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 - > >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable. > >> java.lang.IllegalStateException: n1: Failed updateLastAppliedTermIndex: > >> newTI = (t:1, i:21) < oldTI = (t:2, i:20) > >> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77) > >> at > >> > >> > org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148) > >> at > >> > >> > org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139) > >> at > >> > >> > org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135) > >> at > >> > >> > org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893) > >> at > >> > >> > org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255) > >> at > >> > >> > org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194) > >> at java.base/java.lang.Thread.run(Thread.java:1575) > >> 2026-03-05 15:49:02 INFO RaftServer$Division:528 - > n1@group-ABB3109A44C1 > >> : > >> shutdown > >> ``` > >> > >> As you can see from the stack trace, during the snapshot restore, the > >> termIndex was updated to the latest value seen from the snapshot 2:20, > but > >> when the server was started from a clean slate, then the term was reset > to > >> 1 by the RaftServerImpl at the startup. It then tries to update the log > >> entries and fails because of the precondition check that the term should > >> be > >> monotonically increasing in the log entries. > >> > >> Please let me know if you need more information. > >> > >> Regards > >> > >> On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]> wrote: > >> > >> > Hi Snehasish, > >> > > >> > > ... newTI = (t:1, i:21) ... > >> > > >> > The newTI was invalid. It probably was from the state machine. It > >> should > >> > just use the TermIndex from LogEntryProto. See CounterStateMachine > >> [1] as > >> > an example. > >> > > >> > Tsz-Wo > >> > [1] > >> > > >> > > >> > https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266 > >> > > >> > On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev < > >> > [email protected]> > >> > wrote: > >> > > >> > > Hello everyone, > >> > > > >> > > I was exploring the snapshot restore capability of Ratis and found > one > >> > > scenario that failed. > >> > > > >> > > 1. Start a 3 Node ratis cluster and perform some updates to the > state > >> > > machine. > >> > > 2. Take the snapshot - the snapshot will be of the format > term_index. > >> > Here > >> > > the term will initially be 1, and let's assume the index is at 10. > >> > > 3. Kill the leader, the term would have increased to 2. > >> > > 4. Perform some updates and trigger another snapshot. Let's assume > the > >> > > index is at 20 and term is at 2. > >> > > 5. Stop all nodes. > >> > > 6. A failure is observed while starting the node. > >> > > > >> > > ``` > >> > > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI = > (t:2, > >> > > i:20) > >> > > ``` > >> > > > >> > > Based on the error logs, I suspect the state machine updated the > last > >> > > applied term index to t:2, i:20, but the ServerState has a separate > >> > > variable for tracking the currentTerm which is initialized to 0 at > >> > startup. > >> > > Once the leader is elected, it tried to update the log entry but the > >> > update > >> > > failed due to precondition check. > >> > > > >> > > What's the correct way to solve this problem? Should the term be > reset > >> > to 0 > >> > > while loading the snapshot at the server startup? > >> > > > >> > > References: > >> > > > >> > > > >> > > >> > https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82 > >> > > > >> > > > >> > > >> > https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138 > >> > > > >> > > Thank you for looking into this issue. > >> > > > >> > > > >> > > Regards, > >> > > Snehasish > >> > > > >> > > >> > > >
