[
https://issues.apache.org/jira/browse/HDDS-15578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-15578:
-------------------------------
Description:
There are methods annotated with @Replicate that can throw
InvalidStateTransitionException like
ContainerStateManager#updateContainerStateWithSequenceId.
When the method is applied by SCM Ratis, an exception from the
StateMachineUpdater path can terminate SCM although it is not really a critical
error (e.g. if there are duplicate events, we can simply ignore one). The
interface comment even says replicated methods should be idempotent, but this
implementation is not fully idempotent for stale/duplicate events.
Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.
The chance is very low since most of there is a check of the container state
before in updateContainerStateWithSequenceId the caller will check that the
current container state will not trigger, but it's there.
We can try to fix it by
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal
replicated-state-machine error.
was:
There are methods annotated with @Replicate that can throw
InvalidStateTransitionException like
ContainerStateManager#updateContainerStateWithSequenceId.
When the method is applied by SCM Ratis, an exception from the
StateMachineUpdater path can terminate SCM although it is not really a critical
error (e.g. if there are duplicate events, we can simply ignore one). The
interface comment even says replicated methods should be idempotent, but this
implementation is not fully idempotent for stale/duplicate events.
Example risk:
- Leader submits FINALIZE for OPEN.
- Before/apply ordering or duplicate report causes the current state to already
be CLOSING.
- Applying FINALIZE at CLOSING is invalid.
- Exception escapes from replicated apply path.
The chance is very low since most of there is a check of the container state
before in updateContainerStateWithSequenceId, but it's there.
We can try to fix it by
- Inside the replicated implementation, catch InvalidStateTransitionException.
- Log and return without mutation.
- Treat it as a stale/duplicate lifecycle event, not a fatal
replicated-state-machine error.
> InvalidStateTransitionException can crash SCM
> ---------------------------------------------
>
> Key: HDDS-15578
> URL: https://issues.apache.org/jira/browse/HDDS-15578
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> There are methods annotated with @Replicate that can throw
> InvalidStateTransitionException like
> ContainerStateManager#updateContainerStateWithSequenceId.
> When the method is applied by SCM Ratis, an exception from the
> StateMachineUpdater path can terminate SCM although it is not really a
> critical error (e.g. if there are duplicate events, we can simply ignore
> one). The interface comment even says replicated methods should be
> idempotent, but this implementation is not fully idempotent for
> stale/duplicate events.
> Example risk:
> - Leader submits FINALIZE for OPEN.
> - Before/apply ordering or duplicate report causes the current state to
> already be CLOSING.
> - Applying FINALIZE at CLOSING is invalid.
> - Exception escapes from replicated apply path.
> The chance is very low since most of there is a check of the container state
> before in updateContainerStateWithSequenceId the caller will check that the
> current container state will not trigger, but it's there.
> We can try to fix it by
> - Inside the replicated implementation, catch InvalidStateTransitionException.
> - Log and return without mutation.
> - Treat it as a stale/duplicate lifecycle event, not a fatal
> replicated-state-machine error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]