[
https://issues.apache.org/jira/browse/HELIX-43?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619023#comment-13619023
]
dafu commented on HELIX-43:
---------------------------
I simplified the design based on the following considerations. Hopefully it
clears the confusions.
Actually FATAL is a state that should require manual operations to get out of
it. In this sense, a partition in FATAL state equals to the fact that the
partition has been disabled. So instead of introducing a helix-defined FATAL
state, we just disable the partition when the partition requires a manual
recover operation. This requires almost no code change for the application
code. Here are the complete logics:
* when errors happen during any state transitions, transit to ERROR state,
participant will also invoke state-model.on-err(), ignore errors in
state-model.on-err()
* when drop resource that is in ERROR state and is not disabled, controller
sends ERROR->DROPPED transition. if errors happen in ERROR->DROPPED transition,
participant will disable resource/partition. this will prevent the infinite
loop if error happens during drop
* when disable resource/partition in ERROR state, resource/partition will be
marked disabled, but controller will not send any transitions to disable error
partitions
* when reset resource/partition that is in ERROR state and is not disabled,
controller will send ERROR->initial-state transition. if errors happen in
ERROR->initial-state transition, the partition remains in ERROR state
* when drop resource that is not in ERROR state and is not disabled, controller
sends all the transitions from current-state to initial-state; then sends
initial-state->DROPPED transition
Here is the diff. the main change is in HelixStateTransitionHandler.java where
we disable partitions when error happens when transit from an ERROR state. We
also add a default impl for ERROR->DROPPED state transition in StateModel.java.
In case user doesn't specify the transition, no error will be invoked.
https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=commitdiff;h=cd8272c952377ef9bbb478356ea4a2a9f8e7d3fa
> Add support for error->dropped transition
> -----------------------------------------
>
> Key: HELIX-43
> URL: https://issues.apache.org/jira/browse/HELIX-43
> Project: Apache Helix
> Issue Type: New Feature
> Affects Versions: 0.6.0-incubating
> Reporter: dafu
> Assignee: dafu
> Fix For: 0.6.1-incubating
>
>
> currently helix doesn't support any auto transition from error state. but in
> some situations it might required to drop a partition in error state.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira