Ivan Andika created HDDS-15578:
----------------------------------

             Summary: InvalidStateTransitionException can crash SCM
                 Key: HDDS-15578
                 URL: https://issues.apache.org/jira/browse/HDDS-15578
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ivan Andika


There are methods annotated with @Replicate that can throw 
InvalidStateTransitionException like ContainerStateManager#updateContainerState 
or ContainerStateManager#updateContainerStateWithSequenceId.

When the method is applied by SCM Ratis, an exception from the 
StateMachineUpdater path can terminate SCM. The interface comment even says 
replicated methods should be idempotent, but this implementation is not fully 
idempotent for stale/duplicate events.
Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already 
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.

We can try to fix it by 
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal 
replicated-state-machine error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to