[ 
https://issues.apache.org/jira/browse/HELIX-43?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13619023#comment-13619023
 ] 

dafu commented on HELIX-43:
---------------------------

I simplified the design based on the following considerations. Hopefully it 
clears the confusions. 

Actually FATAL is a state that should require manual operations to get out of 
it. In this sense, a partition in FATAL state equals to the fact that the 
partition has been disabled. So instead of introducing a helix-defined FATAL 
state, we just disable the partition when the partition requires a manual 
recover operation. This requires almost no code change for the application 
code. Here are the complete logics:

* when errors happen during any state transitions, transit to ERROR state, 
participant will also invoke state-model.on-err(), ignore errors in 
state-model.on-err()

* when drop resource that is in ERROR state and is not disabled, controller 
sends ERROR->DROPPED transition. if errors happen in ERROR->DROPPED transition, 
participant will disable resource/partition. this will prevent the infinite 
loop if error happens during drop

* when disable resource/partition in ERROR state, resource/partition will be 
marked disabled, but controller will not send any transitions to disable error 
partitions

* when reset resource/partition that is in ERROR state and is not disabled, 
controller will send ERROR->initial-state transition. if errors happen in 
ERROR->initial-state transition, the partition remains in ERROR state

* when drop resource that is not in ERROR state and is not disabled, controller 
sends all the transitions from current-state to initial-state; then sends 
initial-state->DROPPED transition

Here is the diff. the main change is in HelixStateTransitionHandler.java where 
we disable partitions when error happens when transit from an ERROR state. We 
also add a default impl for ERROR->DROPPED state transition in StateModel.java. 
In case user doesn't specify the transition, no error will be invoked.

https://git-wip-us.apache.org/repos/asf?p=incubator-helix.git;a=commitdiff;h=cd8272c952377ef9bbb478356ea4a2a9f8e7d3fa
                
> Add support for error->dropped transition
> -----------------------------------------
>
>                 Key: HELIX-43
>                 URL: https://issues.apache.org/jira/browse/HELIX-43
>             Project: Apache Helix
>          Issue Type: New Feature
>    Affects Versions: 0.6.0-incubating
>            Reporter: dafu
>            Assignee: dafu
>             Fix For: 0.6.1-incubating
>
>
> currently helix doesn't support any auto transition from error state. but in 
> some situations it might required to drop a partition in error state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to