Hi Tsz Wo,

Thank you again for the prompt response. Kindly let me take a step back and
explain what I am trying to solve.
I want to ensure durability of the State Machine in case all the nodes go
down.

If I am running a 3 node Ratis Cluster and if all the nodes go down due to
some physical hardware failure, I need a way to ensure that when the new
node spawns up it should be able to restore the state.
To do so, I am thinking of taking periodic snapshots to a durable storage
e.g. S3 and when a new node spawn (which should be handled by another
service), it can pull the snapshot from S3 and restore the state.

To simulate this scenario, I clean the storage directory of ratis nodes
before starting it up so they don't have any previous state and let the
nodes pull the snapshot from a separate directory.
Please let me know if there is some other way I can solve this problem.

Hope this helps.

Regards,
Snehasish

On Thu, 5 Mar 2026 at 23:50, Tsz Wo Sze <[email protected]> wrote:

> Hi Snehasish,
>
> > Once the snapshot is triggered, I move it to a different directory to
> simulate clean restart.
>
> Is this step required to reproduce the failure?  If there is a snapshot
> taken, the server expects that the snapshot is there and it may delete the
> raft logs for freeing up space.  If this step is required to reproduce the
> failure, it does not look like a bug.
>
> In general, we cannot manually move the Ratis metadata around.  Just like
> that if we manually move some system files around in Linux or Windows, the
> system may not be able to restart.
>
> Tsz-Wo
>
>
>
> On Thu, Mar 5, 2026 at 10:08 AM Tsz Wo Sze <[email protected]> wrote:
>
> > Hi Snehasish,
> >
> > Since you already have a test, could you share the code change?  You may
> > attach a patch file or create a pull request.   I will run it to
> reproduce
> > the failure.
> >
> > In the meantime, I will try to understand the details you provided.
> >
> > Tsz-Wo
> >
> >
> > On Thu, Mar 5, 2026 at 3:14 AM Snehasish Roy <[email protected]>
> > wrote:
> >
> >> Hi Tsz-Wo,
> >>
> >> Thank you for your prompt response. I was able to reproduce this issue
> >> using CounterStateMachine.
> >>
> >> I added an utility in the CounterClient to trigger a snapshot.
> >>
> >> ```
> >> private void takeSnapshot() throws IOException {
> >>     RaftClientReply raftClientReply = client.getSnapshotManagementApi()
> >>             .create(true, 30_000);
> >>     System.out.println(raftClientReply);
> >> }
> >> ```
> >>
> >> Once the snapshot is triggered, I move it to a different directory to
> >> simulate clean restart.
> >>
> >> I also updated the SimpleStateMachineStorage::loadLatestSnapshot() to
> look
> >> for snapshots in a different directory.
> >>
> >> ```
> >> public SingleFileSnapshotInfo loadLatestSnapshot() {
> >>     final File dir = new File("/tmp/snapshots");
> >> }
> >> ```
> >>
> >> Full steps for reproduction
> >> 1. I started a 3 Node CounterServer and performed some updates to the
> >> state
> >> machine using the CounterClient.
> >>
> >> 2. Triggered the snapshot via the CounterClient and then moved the
> >> snapshot
> >> to a different directory - the snapshot will be of the format
> term_index.
> >> Here the term will initially be 1, and let's assume the index is at 10.
> >>
> >> 3. Kill the leader, the term would have increased to 2.
> >>
> >> 4. Perform some updates and trigger another snapshot. Let's assume the
> >> index is at 20 and the term is at 2. Moved the snapshot to a different
> >> directory.
> >>
> >> 5. Stopped all nodes. Cleared all storage directories of all the nodes
> to
> >> simulate clean restart.
> >>
> >> 6. Start 3 node CounterServer and observe the failure at the startup.
> >>
> >> ```
> >> 026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest snapshot
> >> is
> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in
> >> /tmp/snapshots
> >> 2026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest
> snapshot
> >> is SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in
> >> /tmp/snapshots
> >> 2026-03-05 15:48:56 INFO  RaftServerConfigKeys:62 -
> >> raft.server.log.use.memory = false (default)
> >> 2026-03-05 15:48:56 INFO  RaftServer$Division:155 -
> n0@group-ABB3109A44C1
> >> :
> >> getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1) returns
> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
> >> 2026-03-05 15:48:56 INFO  RaftLog:90 -
> >> n0@group-ABB3109A44C1-SegmentedRaftLog: snapshotIndexFromStateMachine =
> >> 20
> >> ....
> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:577 -
> n1@group-ABB3109A44C1
> >> :
> >> set firstElectionSinceStartup to false for becomeLeader
> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:278 -
> n1@group-ABB3109A44C1
> >> :
> >> change Leader from null to n1 at term 1 for becomeLeader, leader elected
> >> after 672ms
> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:440 -
> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment from
> >> index:21
> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:647 -
> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log segment
> >>
> /ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21
> >> ....
> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:309 - Leader
> >> n1@group-ABB3109A44C1-LeaderStateImpl is ready since appliedIndex ==
> >> startIndex == 21
> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> >> java.lang.IllegalStateException: n1: Failed updateLastAppliedTermIndex:
> >> newTI = (t:1, i:21) < oldTI = (t:2, i:20)
> >> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> >> at
> >>
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148)
> >> at
> >>
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139)
> >> at
> >>
> >>
> org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135)
> >> at
> >>
> >>
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893)
> >> at
> >>
> >>
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255)
> >> at
> >>
> >>
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
> >> at java.base/java.lang.Thread.run(Thread.java:1575)
> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:528 -
> n1@group-ABB3109A44C1
> >> :
> >> shutdown
> >> ```
> >>
> >> As you can see from the stack trace, during the snapshot restore, the
> >> termIndex was updated to the latest value seen from the snapshot 2:20,
> but
> >> when the server was started from a clean slate, then the term was reset
> to
> >> 1 by the RaftServerImpl at the startup. It then tries to update the log
> >> entries and fails because of the precondition check that the term should
> >> be
> >> monotonically increasing in the log entries.
> >>
> >> Please let me know if you need more information.
> >>
> >> Regards
> >>
> >> On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]> wrote:
> >>
> >> > Hi Snehasish,
> >> >
> >> > > ... newTI = (t:1, i:21) ...
> >> >
> >> > The newTI was invalid.  It probably was from the state machine.  It
> >> should
> >> > just use the TermIndex from LogEntryProto.  See  CounterStateMachine
> >> [1] as
> >> > an example.
> >> >
> >> > Tsz-Wo
> >> > [1]
> >> >
> >> >
> >>
> https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266
> >> >
> >> > On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev <
> >> > [email protected]>
> >> > wrote:
> >> >
> >> > > Hello everyone,
> >> > >
> >> > > I was exploring the snapshot restore capability of Ratis and found
> one
> >> > > scenario that failed.
> >> > >
> >> > > 1. Start a 3 Node ratis cluster and perform some updates to the
> state
> >> > > machine.
> >> > > 2. Take the snapshot - the snapshot will be of the format
> term_index.
> >> > Here
> >> > > the term will initially be 1, and let's assume the index is at 10.
> >> > > 3. Kill the leader, the term would have increased to 2.
> >> > > 4. Perform some updates and trigger another snapshot. Let's assume
> the
> >> > > index is at 20 and term is at 2.
> >> > > 5. Stop all nodes.
> >> > > 6. A failure is observed while starting the node.
> >> > >
> >> > > ```
> >> > > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI =
> (t:2,
> >> > > i:20)
> >> > > ```
> >> > >
> >> > > Based on the error logs, I suspect the state machine updated the
> last
> >> > > applied term index to t:2, i:20, but the ServerState has a separate
> >> > > variable for tracking the currentTerm which is initialized to 0 at
> >> > startup.
> >> > > Once the leader is elected, it tried to update the log entry but the
> >> > update
> >> > > failed due to precondition check.
> >> > >
> >> > > What's the correct way to solve this problem? Should the term be
> reset
> >> > to 0
> >> > > while loading the snapshot at the server startup?
> >> > >
> >> > > References:
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82
> >> > >
> >> > >
> >> >
> >>
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138
> >> > >
> >> > > Thank you for looking into this issue.
> >> > >
> >> > >
> >> > > Regards,
> >> > > Snehasish
> >> > >
> >> >
> >>
> >
>

Reply via email to