Hi Tsz-Wo,
Thank you for your prompt response. I was able to reproduce this issue
using CounterStateMachine.
I added an utility in the CounterClient to trigger a snapshot.
```
private void takeSnapshot() throws IOException {
RaftClientReply raftClientReply = client.getSnapshotManagementApi()
.create(true, 30_000);
System.out.println(raftClientReply);
}
```
Once the snapshot is triggered, I move it to a different directory to
simulate clean restart.
I also updated the SimpleStateMachineStorage::loadLatestSnapshot() to look
for snapshots in a different directory.
```
public SingleFileSnapshotInfo loadLatestSnapshot() {
final File dir = new File("/tmp/snapshots");
}
```
Full steps for reproduction
1. I started a 3 Node CounterServer and performed some updates to the state
machine using the CounterClient.
2. Triggered the snapshot via the CounterClient and then moved the snapshot
to a different directory - the snapshot will be of the format term_index.
Here the term will initially be 1, and let's assume the index is at 10.
3. Kill the leader, the term would have increased to 2.
4. Perform some updates and trigger another snapshot. Let's assume the
index is at 20 and the term is at 2. Moved the snapshot to a different
directory.
5. Stopped all nodes. Cleared all storage directories of all the nodes to
simulate clean restart.
6. Start 3 node CounterServer and observe the failure at the startup.
```
026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest snapshot is
SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in
/tmp/snapshots
2026-03-05 15:48:56 INFO SimpleStateMachineStorage:229 - Latest snapshot
is SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in
/tmp/snapshots
2026-03-05 15:48:56 INFO RaftServerConfigKeys:62 -
raft.server.log.use.memory = false (default)
2026-03-05 15:48:56 INFO RaftServer$Division:155 - n0@group-ABB3109A44C1:
getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1) returns
SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
2026-03-05 15:48:56 INFO RaftLog:90 -
n0@group-ABB3109A44C1-SegmentedRaftLog: snapshotIndexFromStateMachine = 20
....
2026-03-05 15:49:02 INFO RaftServer$Division:577 - n1@group-ABB3109A44C1:
set firstElectionSinceStartup to false for becomeLeader
2026-03-05 15:49:02 INFO RaftServer$Division:278 - n1@group-ABB3109A44C1:
change Leader from null to n1 at term 1 for becomeLeader, leader elected
after 672ms
2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:440 -
n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment from index:21
2026-03-05 15:49:02 INFO SegmentedRaftLogWorker:647 -
n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log segment
/ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21
....
2026-03-05 15:49:02 INFO RaftServer$Division:309 - Leader
n1@group-ABB3109A44C1-LeaderStateImpl is ready since appliedIndex ==
startIndex == 21
2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
java.lang.IllegalStateException: n1: Failed updateLastAppliedTermIndex:
newTI = (t:1, i:21) < oldTI = (t:2, i:20)
at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
at
org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148)
at
org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139)
at
org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135)
at
org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893)
at
org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255)
at
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
at java.base/java.lang.Thread.run(Thread.java:1575)
2026-03-05 15:49:02 INFO RaftServer$Division:528 - n1@group-ABB3109A44C1:
shutdown
```
As you can see from the stack trace, during the snapshot restore, the
termIndex was updated to the latest value seen from the snapshot 2:20, but
when the server was started from a clean slate, then the term was reset to
1 by the RaftServerImpl at the startup. It then tries to update the log
entries and fails because of the precondition check that the term should be
monotonically increasing in the log entries.
Please let me know if you need more information.
Regards
On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]> wrote:
> Hi Snehasish,
>
> > ... newTI = (t:1, i:21) ...
>
> The newTI was invalid. It probably was from the state machine. It should
> just use the TermIndex from LogEntryProto. See CounterStateMachine [1] as
> an example.
>
> Tsz-Wo
> [1]
>
> https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266
>
> On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev <
> [email protected]>
> wrote:
>
> > Hello everyone,
> >
> > I was exploring the snapshot restore capability of Ratis and found one
> > scenario that failed.
> >
> > 1. Start a 3 Node ratis cluster and perform some updates to the state
> > machine.
> > 2. Take the snapshot - the snapshot will be of the format term_index.
> Here
> > the term will initially be 1, and let's assume the index is at 10.
> > 3. Kill the leader, the term would have increased to 2.
> > 4. Perform some updates and trigger another snapshot. Let's assume the
> > index is at 20 and term is at 2.
> > 5. Stop all nodes.
> > 6. A failure is observed while starting the node.
> >
> > ```
> > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI = (t:2,
> > i:20)
> > ```
> >
> > Based on the error logs, I suspect the state machine updated the last
> > applied term index to t:2, i:20, but the ServerState has a separate
> > variable for tracking the currentTerm which is initialized to 0 at
> startup.
> > Once the leader is elected, it tried to update the log entry but the
> update
> > failed due to precondition check.
> >
> > What's the correct way to solve this problem? Should the term be reset
> to 0
> > while loading the snapshot at the server startup?
> >
> > References:
> >
> >
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82
> >
> >
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138
> >
> > Thank you for looking into this issue.
> >
> >
> > Regards,
> > Snehasish
> >
>