Addendum to the above email, I understand that the S3 snapshots can be stale but as all the nodes are gone, I don't have a way to get the latest data, I just need a way to restore from the last known good checkpoint. If the majority of the nodes are still available, followers can easily get the data from the leader and build the state machine.
Regards, Snehasish On Fri, 6 Mar 2026 at 15:31, Snehasish Roy <[email protected]> wrote: > Hi Tsz Wo, > > Thank you again for the prompt response. Kindly let me take a step back > and explain what I am trying to solve. > I want to ensure durability of the State Machine in case all the nodes go > down. > > If I am running a 3 node Ratis Cluster and if all the nodes go down due to > some physical hardware failure, I need a way to ensure that when the new > node spawns up it should be able to restore the state. > To do so, I am thinking of taking periodic snapshots to a durable storage > e.g. S3 and when a new node spawn (which should be handled by another > service), it can pull the snapshot from S3 and restore the state. > > To simulate this scenario, I clean the storage directory of ratis nodes > before starting it up so they don't have any previous state and let the > nodes pull the snapshot from a separate directory. > Please let me know if there is some other way I can solve this problem. > > Hope this helps. > > Regards, > Snehasish > > On Thu, 5 Mar 2026 at 23:50, Tsz Wo Sze <[email protected]> wrote: > >> Hi Snehasish, >> >> > Once the snapshot is triggered, I move it to a different directory to >> simulate clean restart. >> >> Is this step required to reproduce the failure? If there is a snapshot >> taken, the server expects that the snapshot is there and it may delete the >> raft logs for freeing up space. If this step is required to reproduce the >> failure, it does not look like a bug. >> >> In general, we cannot manually move the Ratis metadata around. Just like >> that if we manually move some system files around in Linux or Windows, the >> system may not be able to restart. >> >> Tsz-Wo >> >> >> >> On Thu, Mar 5, 2026 at 10:08 AM Tsz Wo Sze <[email protected]> wrote: >> >> > Hi Snehasish, >> > >> > Since you already have a test, could you share the code change? You may >> > attach a patch file or create a pull request. I will run it to >> reproduce >> > the failure. >> > >> > In the meantime, I will try to understand the details you provided. >> > >> > Tsz-Wo >> > >> > >> > On Thu, Mar 5, 2026 at 3:14 AM Snehasish Roy <[email protected]> >> > wrote: >> > >> >> Hi Tsz-Wo, >> >> >> >> Thank you for your prompt response. I was able to reproduce this issue >> >> using CounterStateMachine. >> >> >> >> I added an utility in the CounterClient to trigger a snapshot. >> >> >> >> ``` >> >> private void takeSnapshot() throws IOException { >> >> RaftClientReply raftClientReply = client.getSnapshotManagementApi() >> >> .create(true, 30_000); >> >> System.out.println(raftClientReply); >> >> } >> >> ``` >> >> >> >> Once the snapshot is triggered, I move it to a different directory to >> >> simulate clean restart. >> >> >> >> I also updated the SimpleStateMachineStorage::loadLatestSnapshot() to >> look >> >> for snapshots in a different directory. >> >> >> >> ``` >> >> public SingleFileSnapshotInfo loadLatestSnapshot() { >> >> final File dir = new File("/tmp/snapshots"); >> >> } >> >> ``` >> >> >> >> Full steps for reproduction >> >> 1. I started a 3 Node CounterServer and performed some updates to the >> >> state >> >> machine using the CounterClient. >> >> >> >> 2. Triggered the snapshot via the CounterClient and then moved the >> >> snapshot >> >> to a different directory - the snapshot will be of the format >> term_index. >> >> Here the term will initially be 1, and let's assume the index is at 10. >> >> >> >> 3. Kill the leader, the term would have increased to 2. >> >> >> >> 4. Perform some updates and trigger another snapshot. Let's assume the >> >> index is at 20 and the term is at 2. Moved the snapshot to a different >> >> directory. >> >> >> >> 5. Stopped all nodes. Cleared all storage directories of all the nodes >> to >> >> simulate clean restart. >> >> >> >> 6. Start 3 node CounterServer and observe the failure at the startup. >> >> >> >> ``` >> >> 026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest >> snapshot >> >> is >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in >> >> /tmp/snapshots >> >> 2026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest >> snapshot >> >> is SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in >> >> /tmp/snapshots >> >> 2026-03-05 15:48:56 INFO RaftServerConfigKeys:62 - >> >> raft.server.log.use.memory = false (default) >> >> 2026-03-05 15:48:56 INFO RaftServer$Division:155 - >> n0@group-ABB3109A44C1 >> >> : >> >> getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1) returns >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] >> >> 2026-03-05 15:48:56 INFO RaftLog:90 - >> >> n0@group-ABB3109A44C1-SegmentedRaftLog: snapshotIndexFromStateMachine >> = >> >> 20 >> >> .... >> >> 2026-03-05 15:49:02 INFO RaftServer$Division:577 - >> n1@group-ABB3109A44C1 >> >> : >> >> set firstElectionSinceStartup to false for becomeLeader >> >> 2026-03-05 15:49:02 INFO RaftServer$Division:278 - >> n1@group-ABB3109A44C1 >> >> : >> >> change Leader from null to n1 at term 1 for becomeLeader, leader >> elected >> >> after 672ms >> >> 2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:440 - >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment from >> >> index:21 >> >> 2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:647 - >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log segment >> >> >> /ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21 >> >> .... >> >> 2026-03-05 15:49:02 INFO RaftServer$Division:309 - Leader >> >> n1@group-ABB3109A44C1-LeaderStateImpl is ready since appliedIndex == >> >> startIndex == 21 >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 - >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable. >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 - >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable. >> >> java.lang.IllegalStateException: n1: Failed updateLastAppliedTermIndex: >> >> newTI = (t:1, i:21) < oldTI = (t:2, i:20) >> >> at >> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77) >> >> at >> >> >> >> >> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148) >> >> at >> >> >> >> >> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139) >> >> at >> >> >> >> >> org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135) >> >> at >> >> >> >> >> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893) >> >> at >> >> >> >> >> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255) >> >> at >> >> >> >> >> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194) >> >> at java.base/java.lang.Thread.run(Thread.java:1575) >> >> 2026-03-05 15:49:02 INFO RaftServer$Division:528 - >> n1@group-ABB3109A44C1 >> >> : >> >> shutdown >> >> ``` >> >> >> >> As you can see from the stack trace, during the snapshot restore, the >> >> termIndex was updated to the latest value seen from the snapshot 2:20, >> but >> >> when the server was started from a clean slate, then the term was >> reset to >> >> 1 by the RaftServerImpl at the startup. It then tries to update the log >> >> entries and fails because of the precondition check that the term >> should >> >> be >> >> monotonically increasing in the log entries. >> >> >> >> Please let me know if you need more information. >> >> >> >> Regards >> >> >> >> On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]> wrote: >> >> >> >> > Hi Snehasish, >> >> > >> >> > > ... newTI = (t:1, i:21) ... >> >> > >> >> > The newTI was invalid. It probably was from the state machine. It >> >> should >> >> > just use the TermIndex from LogEntryProto. See CounterStateMachine >> >> [1] as >> >> > an example. >> >> > >> >> > Tsz-Wo >> >> > [1] >> >> > >> >> > >> >> >> https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266 >> >> > >> >> > On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev < >> >> > [email protected]> >> >> > wrote: >> >> > >> >> > > Hello everyone, >> >> > > >> >> > > I was exploring the snapshot restore capability of Ratis and found >> one >> >> > > scenario that failed. >> >> > > >> >> > > 1. Start a 3 Node ratis cluster and perform some updates to the >> state >> >> > > machine. >> >> > > 2. Take the snapshot - the snapshot will be of the format >> term_index. >> >> > Here >> >> > > the term will initially be 1, and let's assume the index is at 10. >> >> > > 3. Kill the leader, the term would have increased to 2. >> >> > > 4. Perform some updates and trigger another snapshot. Let's assume >> the >> >> > > index is at 20 and term is at 2. >> >> > > 5. Stop all nodes. >> >> > > 6. A failure is observed while starting the node. >> >> > > >> >> > > ``` >> >> > > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI = >> (t:2, >> >> > > i:20) >> >> > > ``` >> >> > > >> >> > > Based on the error logs, I suspect the state machine updated the >> last >> >> > > applied term index to t:2, i:20, but the ServerState has a separate >> >> > > variable for tracking the currentTerm which is initialized to 0 at >> >> > startup. >> >> > > Once the leader is elected, it tried to update the log entry but >> the >> >> > update >> >> > > failed due to precondition check. >> >> > > >> >> > > What's the correct way to solve this problem? Should the term be >> reset >> >> > to 0 >> >> > > while loading the snapshot at the server startup? >> >> > > >> >> > > References: >> >> > > >> >> > > >> >> > >> >> >> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82 >> >> > > >> >> > > >> >> > >> >> >> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138 >> >> > > >> >> > > Thank you for looking into this issue. >> >> > > >> >> > > >> >> > > Regards, >> >> > > Snehasish >> >> > > >> >> > >> >> >> > >> >
