Abhishek Pal created RATIS-2430:
-----------------------------------

             Summary: Handle rollback on snapshot installation failure.
                 Key: RATIS-2430
                 URL: https://issues.apache.org/jira/browse/RATIS-2430
             Project: Ratis
          Issue Type: Improvement
          Components: snapshot
            Reporter: Abhishek Pal
            Assignee: Abhishek Pal


Today when 
[SnapshotInstallatioHandler#checkAndInstallSnapshot|https://github.com/apache/ratis/blob/1433b4cbf7350afcf9b2f871dd48c48e17a91a1d/ratis-server/src/main/java/org/apache/ratis/server/impl/SnapshotInstallationHandler.java#L167]
 calls *state.installSnapshot(request)* - it pauses the state machine via 
*ServerState.installSnapshot()*.

However this means that in case later checks fail or IO fails or any such 
scenario occurs, then there is no clear rollback option. Followers in this 
scenario can be left in a partial installation state.

One way to mitigate this is in appendChunk we can write to a temp file without 
pausing StateMachine. When this is done we can atomically apply the snapshot 
and reload the statemachine log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to