In my proposal FATAL would be a final state, manual intervention required. 1) In our use case, the problem is that when a regular transition (say offline->online) fails and goes to error state. if then the resource gets removed, the participant remains in "ERROR" state so we can't reuse it because in order to reuse it we need to transit to dropped first. 2) The thing is, in our use case the drop comes from an api call which is not synchronized with the cluster management code which could issue the reset. Also, if we reset it, wouldn't the controller push the transitions trying to have reach the ideal state again (likely triggering the same issue that led to ERROR?)
Thanks Santi On Mon, Feb 11, 2013 at 5:25 PM, Zhen Zhang <[email protected]> wrote: > If we are going to add a new FATAL state, we might potentially add FATAL to > all state models and all applications might have to implement ERROR->FATAL > and FATAL->initial_state transitions. > > On the other hand, I have a couple of questions: > 1) why in your use case, ERROR state is inevitable? > 2) if a partition goes to ERROR state, could we reset it, so only error > partitions will get an ERROR->initial_state transition and then drop it? If > no error happens during ERROR->initial_state, the error is recoverable, and > the resource will be dropped. otherwise, if something goes wrong with > ERROR->initial_state, participant remains in ERROR state, drop failed, and > the resource is not reusable? > > Thanks, > Jason > > On Mon, Feb 11, 2013 at 1:47 PM, Santiago Perez <[email protected] > >wrote: > > > For our use case that's somewhat problematic. It's still better than the > > current inability to go from error to dropped but the problem is now if > > something goes wrong when dropping there's no way to know that from the > > participant states. And that's actually the only unrecoverable situation > > for our use case. Basically it means that the participant cannot be > reused > > for another purpose. An alternative solution would be to have a FATAL > state > > that is reached when a failure occurs when transitioning out of the ERROR > > state. > > > > Cheers, > > Santi > > > > > > On Wed, Feb 6, 2013 at 1:57 PM, Zhen Zhang <[email protected]> wrote: > > > > > Hi, > > > > > > I am going to add the support of error->drop transition in Helix. The > > > basic idea is to remove DROPPED state from state model; instead we add > a > > > drop() (or cleanup()) abstract method in StateModel. Applications need > to > > > implement this abstract method to take care of the drop logic. This > > > requires no change on the controller side. On the participant side, > when > > > the participant receives a state-transition message with > ToState=DROPPED, > > > it will invoke the drop() method in the state model. When the drop() > gets > > > executed, the partition will be removed from the current state > regardless > > > of any errors/exceptions during the execution of drop(). This will > > prevent > > > the infinite loop of calling drop() in case of error/exception in the > > > execution of drop(). The advantage of this design is that we can remove > > > DROPPED state totally from all state model definitions, which keeps the > > > state model simple. The disadvantage is, in drop() the application need > > to > > > take different drop logics based on the current state (e.g. MASTER, > > SLAVE, > > > or ERROR, which will be the FromState in the message). Any suggestions? > > > > > > Thanks, > > > > > > Jason > > > > > >
