Abhishek Pal created RATIS-2430:
-----------------------------------
Summary: Handle rollback on snapshot installation failure.
Key: RATIS-2430
URL: https://issues.apache.org/jira/browse/RATIS-2430
Project: Ratis
Issue Type: Improvement
Components: snapshot
Reporter: Abhishek Pal
Assignee: Abhishek Pal
Today when
[SnapshotInstallatioHandler#checkAndInstallSnapshot|https://github.com/apache/ratis/blob/1433b4cbf7350afcf9b2f871dd48c48e17a91a1d/ratis-server/src/main/java/org/apache/ratis/server/impl/SnapshotInstallationHandler.java#L167]
calls *state.installSnapshot(request)* - it pauses the state machine via
*ServerState.installSnapshot()*.
However this means that in case later checks fail or IO fails or any such
scenario occurs, then there is no clear rollback option. Followers in this
scenario can be left in a partial installation state.
One way to mitigate this is in appendChunk we can write to a temp file without
pausing StateMachine. When this is done we can atomically apply the snapshot
and reload the statemachine log.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)