Sounds great. Two questions:
1) any chance this could be included in this month's release? 2) can I help in any way? Thanks, Santi On Fri, Mar 22, 2013 at 9:01 PM, Zhen Zhang <[email protected]> wrote: > Here is my thought: > > 1) yes. DROPPED logic will remain the same. We will first transit to user > defined initial state and then to DROPPED state. > > 2) StateModel.onError() should provide the state-transition message that > causes the error. It should look like StateModel.onError(Message message, > NotificationContext context).We could either embed error context into > notification context or provide an addition error context as an argument. > We probably need to provide context for drop() and reset() also? > > 3) if onError() fails, we can still transit to ERROR or we can go directly > to FATAL. if drop/reset() fails, we remain in FATAL. > > Any suggestions? > > Thanks, > Jason > > On Fri, Mar 22, 2013 at 12:04 PM, Santiago Perez <[email protected] > >wrote: > > > Sounds good, couple of questions though: > > > > 1) What will happen when transiting from user defined state to DROPPED? > > same as today? > > 2) Will there be a way in the onError() to know what transition was > taking > > place? or is that up to the implementation? Are there any possible > > directions to be given in the onError() callback? > > 3) What will the behavior be if any of these methods (other than drop()) > > fail? Simply ignored? > > > > Thanks, > > Santi > > > > > > On Fri, Mar 22, 2013 at 3:29 PM, Zhen Zhang <[email protected]> wrote: > > > > > Hi, I am fine with FATAL state, but I think we should clearly separate > > > helix defined states from user defined states. Helix define states > (i.e. > > > ERROR, DROPPED, FATAL) need not to be defined in state model and state > > > transitions logic involving helix defined states should be common to > all > > > state models. In addition, helix should provide default implementation > > for > > > transitions involving helix defined states. In case applications don't > > care > > > about them, they don't implement these transitions. Here are what I am > > > thinking of: > > > > > > - Helix will invoke StateModel.onError() if current state is any user > > > defined state and error occurs in the transition. > > > > > > - Helix will invoke StateModel.drop() if current state is ERROR and > > target > > > state is DROPPED. If drop() succeeds, ERROR will transit to initial > state > > > and then to DROPPED; otherwise to FATAL state. > > > > > > - Helix will invoke StateModel.reset() if current state is FATAL and we > > > issue a reset command. If reset() succeeds, FATAL will transit to > initial > > > state; otherwise remain in FATAL state. Also reset() should be invoked > > only > > > by admin commands, so in case reset() fails, we don't call it > infinitely. > > > > > > Thanks, > > > Jason > > > > > > > > > On Fri, Mar 22, 2013 at 5:36 AM, Santiago Perez <[email protected] > > > >wrote: > > > > > > > I personally prefer the FATAL state approach. What do you think > Jason? > > > > > > > > > > > > On Fri, Mar 22, 2013 at 4:50 AM, kishore g <[email protected]> > > wrote: > > > > > > > > > Hi Terence/Jason/Santi, > > > > > > > > > > Did we come to a conclusion on this. Terence proposal looks good to > > me. > > > > If > > > > > adding FATAL state is more invasive, I suggest simply disabling the > > > > > partition on that node and set some reason for disabling for > > > > > auditing/diagnosis. The advantage of this is if the underlying > error > > is > > > > > rectified then one can enable the partition and transition > > ERROR->DROP > > > > will > > > > > be invoked. Disabling ensures that even if node restarts it will > not > > > host > > > > > that partition again. > > > > > > > > > > thanks, > > > > > Kishore G > > > > > > > > > > > > > > > On Mon, Feb 11, 2013 at 8:58 PM, Terence Yim <[email protected]> > > wrote: > > > > > > > > > > > I proposed the FATAL state to Kishore before. Let me write it > down > > > > again > > > > > > for discussion. > > > > > > > > > > > > 1. An extra state, "FATAL", is introduced. It is a system state, > > just > > > > > like > > > > > > the existing ERROR state, which doesn't need to be explicitly > > defined > > > > in > > > > > > state model. > > > > > > 2. Just like the current implementation, whenever there is any > > error > > > > > during > > > > > > participant state transition, transit the participant into ERROR > > > state > > > > > and > > > > > > stay there. > > > > > > 3. Also just like current implementation, when a given resource > is > > > > > deleted, > > > > > > trigger state transition from CURRENT_STATE -> DROPPED (and goes > > > > through > > > > > > necessary state transition based on the state model). > > > > > > 4. For participants that have current state = ERROR, trigger > > > > > ERROR->DROPPED > > > > > > transition (can have a default callback in the StateModel that do > > > > nothing > > > > > > in this transition, but it's up to further discussion). > > > > > > 5. If and only if there is exception thrown during the > > ERROR->DROPPED > > > > > > transition, transit the participant to FATAL state. > > > > > > 6. When a participant gets into FATAL state, there is no way for > it > > > to > > > > > get > > > > > > out of it without human intervention, meaning a human need to > > inspect > > > > and > > > > > > reset it manually (or through some tools). > > > > > > > > > > > > With this, there would be changes in Controller, but no change in > > > > > > participant if there nothing to specially handled during > > > ERROR->DROPPED > > > > > > transition. Also, all error handling would be done with state > > > > transition, > > > > > > which gives the participant more consistent way on handling > > different > > > > > > scenarios. This also guarantees that every calls are sync and > > thread > > > > > safe. > > > > > > > > > > > > Terence > > > > > > > > > > > > On Mon, Feb 11, 2013 at 7:23 PM, Santiago Perez < > > > [email protected] > > > > > > >wrote: > > > > > > > > > > > > > In my proposal FATAL would be a final state, manual > intervention > > > > > > required. > > > > > > > > > > > > > > 1) In our use case, the problem is that when a regular > transition > > > > (say > > > > > > > offline->online) fails and goes to error state. if then the > > > resource > > > > > gets > > > > > > > removed, the participant remains in "ERROR" state so we can't > > reuse > > > > it > > > > > > > because in order to reuse it we need to transit to dropped > first. > > > > > > > 2) The thing is, in our use case the drop comes from an api > call > > > > which > > > > > is > > > > > > > not synchronized with the cluster management code which could > > issue > > > > the > > > > > > > reset. Also, if we reset it, wouldn't the controller push the > > > > > transitions > > > > > > > trying to have reach the ideal state again (likely triggering > the > > > > same > > > > > > > issue that led to ERROR?) > > > > > > > > > > > > > > Thanks > > > > > > > Santi > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 11, 2013 at 5:25 PM, Zhen Zhang < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > > > If we are going to add a new FATAL state, we might > potentially > > > add > > > > > > FATAL > > > > > > > to > > > > > > > > all state models and all applications might have to implement > > > > > > > ERROR->FATAL > > > > > > > > and FATAL->initial_state transitions. > > > > > > > > > > > > > > > > On the other hand, I have a couple of questions: > > > > > > > > 1) why in your use case, ERROR state is inevitable? > > > > > > > > 2) if a partition goes to ERROR state, could we reset it, so > > only > > > > > error > > > > > > > > partitions will get an ERROR->initial_state transition and > then > > > > drop > > > > > > it? > > > > > > > If > > > > > > > > no error happens during ERROR->initial_state, the error is > > > > > recoverable, > > > > > > > and > > > > > > > > the resource will be dropped. otherwise, if something goes > > wrong > > > > with > > > > > > > > ERROR->initial_state, participant remains in ERROR state, > drop > > > > > failed, > > > > > > > and > > > > > > > > the resource is not reusable? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Jason > > > > > > > > > > > > > > > > On Mon, Feb 11, 2013 at 1:47 PM, Santiago Perez < > > > > > [email protected] > > > > > > > > >wrote: > > > > > > > > > > > > > > > > > For our use case that's somewhat problematic. It's still > > better > > > > > than > > > > > > > the > > > > > > > > > current inability to go from error to dropped but the > problem > > > is > > > > > now > > > > > > if > > > > > > > > > something goes wrong when dropping there's no way to know > > that > > > > from > > > > > > the > > > > > > > > > participant states. And that's actually the only > > unrecoverable > > > > > > > situation > > > > > > > > > for our use case. Basically it means that the participant > > > cannot > > > > be > > > > > > > > reused > > > > > > > > > for another purpose. An alternative solution would be to > > have a > > > > > FATAL > > > > > > > > state > > > > > > > > > that is reached when a failure occurs when transitioning > out > > of > > > > the > > > > > > > ERROR > > > > > > > > > state. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Santi > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Feb 6, 2013 at 1:57 PM, Zhen Zhang < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I am going to add the support of error->drop transition > in > > > > > Helix. > > > > > > > The > > > > > > > > > > basic idea is to remove DROPPED state from state model; > > > instead > > > > > we > > > > > > > add > > > > > > > > a > > > > > > > > > > drop() (or cleanup()) abstract method in StateModel. > > > > Applications > > > > > > > need > > > > > > > > to > > > > > > > > > > implement this abstract method to take care of the drop > > > logic. > > > > > This > > > > > > > > > > requires no change on the controller side. On the > > participant > > > > > side, > > > > > > > > when > > > > > > > > > > the participant receives a state-transition message with > > > > > > > > ToState=DROPPED, > > > > > > > > > > it will invoke the drop() method in the state model. When > > the > > > > > > drop() > > > > > > > > gets > > > > > > > > > > executed, the partition will be removed from the current > > > state > > > > > > > > regardless > > > > > > > > > > of any errors/exceptions during the execution of drop(). > > This > > > > > will > > > > > > > > > prevent > > > > > > > > > > the infinite loop of calling drop() in case of > > > error/exception > > > > in > > > > > > the > > > > > > > > > > execution of drop(). The advantage of this design is that > > we > > > > can > > > > > > > remove > > > > > > > > > > DROPPED state totally from all state model definitions, > > which > > > > > keeps > > > > > > > the > > > > > > > > > > state model simple. The disadvantage is, in drop() the > > > > > application > > > > > > > need > > > > > > > > > to > > > > > > > > > > take different drop logics based on the current state > (e.g. > > > > > MASTER, > > > > > > > > > SLAVE, > > > > > > > > > > or ERROR, which will be the FromState in the message). > Any > > > > > > > suggestions? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
