Re: Issue in Ratis server startup after loading snapshot

Snehasish Roy Sun, 08 Mar 2026 20:04:08 -0700

Hi Tsz-Wo,

Thank you for your prompt response. Backing up the entire directory is
feasible but as each file will be backed up at a different time, it can
create subtle inconsistencies.
Could you please guide me a bit on how to recreate the metadata from the
snapshot?


I am sure it would be helpful for the others looking to backup their Ratis
State Machine.


Regards,
Snehasish


On Sat, 7 Mar 2026 at 02:30, Tsz Wo Sze <[email protected]> wrote:

> Hi Snehasish,
>
> I see your requirement now.  For backup and restore, one easy way is to
> backup the entire Ratis storage directory, not just the snapshot.  Although
> it is possible to recreate the other Ratis metadata from a snapshot, you
> need to understand a great deal of Ratis in order to do so.  Just copying a
> snapshot won't work since it also needs other Ratis metadata.  Would it
> work for you to just backup the entire directory?
>
> Tsz-Wo
>
>
> On Fri, Mar 6, 2026 at 2:23 AM Snehasish Roy <[email protected]>
> wrote:
>
> > Addendum to the above email, I understand that the S3 snapshots can be
> > stale but as all the nodes are gone, I don't have a way to get the latest
> > data, I just need a way to restore from the last known good checkpoint.
> If
> > the majority of the nodes are still available, followers can easily get
> the
> > data from the leader and build the state machine.
> >
> >
> > Regards,
> > Snehasish
> >
> > On Fri, 6 Mar 2026 at 15:31, Snehasish Roy <[email protected]>
> > wrote:
> >
> > > Hi Tsz Wo,
> > >
> > > Thank you again for the prompt response. Kindly let me take a step back
> > > and explain what I am trying to solve.
> > > I want to ensure durability of the State Machine in case all the nodes
> go
> > > down.
> > >
> > > If I am running a 3 node Ratis Cluster and if all the nodes go down due
> > to
> > > some physical hardware failure, I need a way to ensure that when the
> new
> > > node spawns up it should be able to restore the state.
> > > To do so, I am thinking of taking periodic snapshots to a durable
> storage
> > > e.g. S3 and when a new node spawn (which should be handled by another
> > > service), it can pull the snapshot from S3 and restore the state.
> > >
> > > To simulate this scenario, I clean the storage directory of ratis nodes
> > > before starting it up so they don't have any previous state and let the
> > > nodes pull the snapshot from a separate directory.
> > > Please let me know if there is some other way I can solve this problem.
> > >
> > > Hope this helps.
> > >
> > > Regards,
> > > Snehasish
> > >
> > > On Thu, 5 Mar 2026 at 23:50, Tsz Wo Sze <[email protected]> wrote:
> > >
> > >> Hi Snehasish,
> > >>
> > >> > Once the snapshot is triggered, I move it to a different directory
> to
> > >> simulate clean restart.
> > >>
> > >> Is this step required to reproduce the failure?  If there is a
> snapshot
> > >> taken, the server expects that the snapshot is there and it may delete
> > the
> > >> raft logs for freeing up space.  If this step is required to reproduce
> > the
> > >> failure, it does not look like a bug.
> > >>
> > >> In general, we cannot manually move the Ratis metadata around.  Just
> > like
> > >> that if we manually move some system files around in Linux or Windows,
> > the
> > >> system may not be able to restart.
> > >>
> > >> Tsz-Wo
> > >>
> > >>
> > >>
> > >> On Thu, Mar 5, 2026 at 10:08 AM Tsz Wo Sze <[email protected]>
> wrote:
> > >>
> > >> > Hi Snehasish,
> > >> >
> > >> > Since you already have a test, could you share the code change?  You
> > may
> > >> > attach a patch file or create a pull request.   I will run it to
> > >> reproduce
> > >> > the failure.
> > >> >
> > >> > In the meantime, I will try to understand the details you provided.
> > >> >
> > >> > Tsz-Wo
> > >> >
> > >> >
> > >> > On Thu, Mar 5, 2026 at 3:14 AM Snehasish Roy <
> > [email protected]>
> > >> > wrote:
> > >> >
> > >> >> Hi Tsz-Wo,
> > >> >>
> > >> >> Thank you for your prompt response. I was able to reproduce this
> > issue
> > >> >> using CounterStateMachine.
> > >> >>
> > >> >> I added an utility in the CounterClient to trigger a snapshot.
> > >> >>
> > >> >> ```
> > >> >> private void takeSnapshot() throws IOException {
> > >> >>     RaftClientReply raftClientReply =
> > client.getSnapshotManagementApi()
> > >> >>             .create(true, 30_000);
> > >> >>     System.out.println(raftClientReply);
> > >> >> }
> > >> >> ```
> > >> >>
> > >> >> Once the snapshot is triggered, I move it to a different directory
> to
> > >> >> simulate clean restart.
> > >> >>
> > >> >> I also updated the SimpleStateMachineStorage::loadLatestSnapshot()
> to
> > >> look
> > >> >> for snapshots in a different directory.
> > >> >>
> > >> >> ```
> > >> >> public SingleFileSnapshotInfo loadLatestSnapshot() {
> > >> >>     final File dir = new File("/tmp/snapshots");
> > >> >> }
> > >> >> ```
> > >> >>
> > >> >> Full steps for reproduction
> > >> >> 1. I started a 3 Node CounterServer and performed some updates to
> the
> > >> >> state
> > >> >> machine using the CounterClient.
> > >> >>
> > >> >> 2. Triggered the snapshot via the CounterClient and then moved the
> > >> >> snapshot
> > >> >> to a different directory - the snapshot will be of the format
> > >> term_index.
> > >> >> Here the term will initially be 1, and let's assume the index is at
> > 10.
> > >> >>
> > >> >> 3. Kill the leader, the term would have increased to 2.
> > >> >>
> > >> >> 4. Perform some updates and trigger another snapshot. Let's assume
> > the
> > >> >> index is at 20 and the term is at 2. Moved the snapshot to a
> > different
> > >> >> directory.
> > >> >>
> > >> >> 5. Stopped all nodes. Cleared all storage directories of all the
> > nodes
> > >> to
> > >> >> simulate clean restart.
> > >> >>
> > >> >> 6. Start 3 node CounterServer and observe the failure at the
> startup.
> > >> >>
> > >> >> ```
> > >> >> 026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest
> > >> snapshot
> > >> >> is
> > >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20] in
> > >> >> /tmp/snapshots
> > >> >> 2026-03-05 15:48:56 INFO  SimpleStateMachineStorage:229 - Latest
> > >> snapshot
> > >> >> is SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
> > in
> > >> >> /tmp/snapshots
> > >> >> 2026-03-05 15:48:56 INFO  RaftServerConfigKeys:62 -
> > >> >> raft.server.log.use.memory = false (default)
> > >> >> 2026-03-05 15:48:56 INFO  RaftServer$Division:155 -
> > >> n0@group-ABB3109A44C1
> > >> >> :
> > >> >> getLatestSnapshot(CounterStateMachine-1:n0:group-ABB3109A44C1)
> > returns
> > >> >> SingleFileSnapshotInfo(t:2, i:20):[/tmp/snapshots/snapshot.2_20]
> > >> >> 2026-03-05 15:48:56 INFO  RaftLog:90 -
> > >> >> n0@group-ABB3109A44C1-SegmentedRaftLog:
> > snapshotIndexFromStateMachine
> > >> =
> > >> >> 20
> > >> >> ....
> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:577 -
> > >> n1@group-ABB3109A44C1
> > >> >> :
> > >> >> set firstElectionSinceStartup to false for becomeLeader
> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:278 -
> > >> n1@group-ABB3109A44C1
> > >> >> :
> > >> >> change Leader from null to n1 at term 1 for becomeLeader, leader
> > >> elected
> > >> >> after 672ms
> > >> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:440 -
> > >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: Starting segment
> from
> > >> >> index:21
> > >> >> 2026-03-05 15:49:02 INFO  SegmentedRaftLogWorker:647 -
> > >> >> n1@group-ABB3109A44C1-SegmentedRaftLogWorker: created new log
> > segment
> > >> >>
> > >>
> >
> /ratis/./n1/02511d47-d67c-49a3-9011-abb3109a44c1/current/log_inprogress_21
> > >> >> ....
> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:309 - Leader
> > >> >> n1@group-ABB3109A44C1-LeaderStateImpl is ready since appliedIndex
> ==
> > >> >> startIndex == 21
> > >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> > >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> > >> >> 2026-03-05 15:49:02 ERROR StateMachineUpdater:207 -
> > >> >> n1@group-ABB3109A44C1-StateMachineUpdater caught a Throwable.
> > >> >> java.lang.IllegalStateException: n1: Failed
> > updateLastAppliedTermIndex:
> > >> >> newTI = (t:1, i:21) < oldTI = (t:2, i:20)
> > >> >> at
> > >> org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:77)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:148)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.statemachine.impl.BaseStateMachine.updateLastAppliedTermIndex(BaseStateMachine.java:139)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.statemachine.impl.BaseStateMachine.notifyTermIndexUpdated(BaseStateMachine.java:135)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1893)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.server.impl.StateMachineUpdater.applyLog(StateMachineUpdater.java:255)
> > >> >> at
> > >> >>
> > >> >>
> > >>
> >
> org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:194)
> > >> >> at java.base/java.lang.Thread.run(Thread.java:1575)
> > >> >> 2026-03-05 15:49:02 INFO  RaftServer$Division:528 -
> > >> n1@group-ABB3109A44C1
> > >> >> :
> > >> >> shutdown
> > >> >> ```
> > >> >>
> > >> >> As you can see from the stack trace, during the snapshot restore,
> the
> > >> >> termIndex was updated to the latest value seen from the snapshot
> > 2:20,
> > >> but
> > >> >> when the server was started from a clean slate, then the term was
> > >> reset to
> > >> >> 1 by the RaftServerImpl at the startup. It then tries to update the
> > log
> > >> >> entries and fails because of the precondition check that the term
> > >> should
> > >> >> be
> > >> >> monotonically increasing in the log entries.
> > >> >>
> > >> >> Please let me know if you need more information.
> > >> >>
> > >> >> Regards
> > >> >>
> > >> >> On Wed, 4 Mar 2026 at 06:33, Tsz Wo Sze <[email protected]>
> wrote:
> > >> >>
> > >> >> > Hi Snehasish,
> > >> >> >
> > >> >> > > ... newTI = (t:1, i:21) ...
> > >> >> >
> > >> >> > The newTI was invalid.  It probably was from the state machine.
> It
> > >> >> should
> > >> >> > just use the TermIndex from LogEntryProto.  See
> > CounterStateMachine
> > >> >> [1] as
> > >> >> > an example.
> > >> >> >
> > >> >> > Tsz-Wo
> > >> >> > [1]
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> >
> https://github.com/apache/ratis/blob/3d9f5af376409de7e635bb67c7dfbeadc882c413/ratis-examples/src/main/java/org/apache/ratis/examples/counter/server/CounterStateMachine.java#L263-L266
> > >> >> >
> > >> >> > On Tue, Mar 3, 2026 at 10:52 AM Snehasish Roy via dev <
> > >> >> > [email protected]>
> > >> >> > wrote:
> > >> >> >
> > >> >> > > Hello everyone,
> > >> >> > >
> > >> >> > > I was exploring the snapshot restore capability of Ratis and
> > found
> > >> one
> > >> >> > > scenario that failed.
> > >> >> > >
> > >> >> > > 1. Start a 3 Node ratis cluster and perform some updates to the
> > >> state
> > >> >> > > machine.
> > >> >> > > 2. Take the snapshot - the snapshot will be of the format
> > >> term_index.
> > >> >> > Here
> > >> >> > > the term will initially be 1, and let's assume the index is at
> > 10.
> > >> >> > > 3. Kill the leader, the term would have increased to 2.
> > >> >> > > 4. Perform some updates and trigger another snapshot. Let's
> > assume
> > >> the
> > >> >> > > index is at 20 and term is at 2.
> > >> >> > > 5. Stop all nodes.
> > >> >> > > 6. A failure is observed while starting the node.
> > >> >> > >
> > >> >> > > ```
> > >> >> > > Failed updateLastAppliedTermIndex: newTI = (t:1, i:21) < oldTI
> =
> > >> (t:2,
> > >> >> > > i:20)
> > >> >> > > ```
> > >> >> > >
> > >> >> > > Based on the error logs, I suspect the state machine updated
> the
> > >> last
> > >> >> > > applied term index to t:2, i:20, but the ServerState has a
> > separate
> > >> >> > > variable for tracking the currentTerm which is initialized to 0
> > at
> > >> >> > startup.
> > >> >> > > Once the leader is elected, it tried to update the log entry
> but
> > >> the
> > >> >> > update
> > >> >> > > failed due to precondition check.
> > >> >> > >
> > >> >> > > What's the correct way to solve this problem? Should the term
> be
> > >> reset
> > >> >> > to 0
> > >> >> > > while loading the snapshot at the server startup?
> > >> >> > >
> > >> >> > > References:
> > >> >> > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java#L82
> > >> >> > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://github.com/apache/ratis/blob/master/ratis-server/src/main/java/org/apache/ratis/statemachine/impl/BaseStateMachine.java#L138
> > >> >> > >
> > >> >> > > Thank you for looking into this issue.
> > >> >> > >
> > >> >> > >
> > >> >> > > Regards,
> > >> >> > > Snehasish
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Issue in Ratis server startup after loading snapshot

Reply via email to