[ 
https://issues.apache.org/jira/browse/HELIX-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080926#comment-16080926
 ] 

Jiajun Wang commented on HELIX-659:
-----------------------------------

h2. Design Details

h3. Register Secondary States Model / Factory

Note that if a secondary state model is a dynamic state, 
defaultTransitionHandler has to be implemented.

*State Model Factory*

public abstract class DynamicStateModelFactory extends 
StateModelFactory<DynamicStateModel> {
  ...
}
  
public abstract class DynamicStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext 
context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no 
transition method is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + 
error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer 
own this resource or session timedout");
  }
 
  // !!!!!!!!!! Internal State such as ERROR will still exist and supported 
!!!!!!!!!! //
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext 
context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration

Secondary states are conceptually map values.
Besides the state itself, each state model may have different factory name as 
well. So there will be <StateModel, Factory> and <StateModel, State>.

We keep the design that, 1. state configurations are at the partition level. 2. 
state factory configurations are at the resource level.

In order to allow multiple states to be configured, we propose to represent it 
in JSON string format. Note that the state model name is used as the key, so no 
duplicate model can be used in one partition.

*Resource config with secondary state VERSION*

{
  "id":"Test_Resource"
  ,"simpleFields":{
    "SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"mapFields":{
    "partition_1" : "{VERSION : 1.0.1}"
    ,"partition_2" : "{VERSION : 1.0.2}"
  }
}

*Additional APIs to configure secondary states*

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> 
listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys);

h3. Partitions with the Secondary States shown in Current State and External 
View

Current state shows both the secondary state models and states in the same 
format with resource configuration.

*Current States*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
    ,"SECONDARY_STATE_MODEL_DEF" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "partition_1":{
      "CURRENT_STATE":"MASTER"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
    ,"partition_2":{
      "CURRENT_STATE":"SLAVE"
      ,"SECONDARY_STATES":"{VERSION : 1.0.1}"
      ,"INFO":""
    }
  }
}

As for the external view, we have 2 options to show secondary states.
1. Compressing all states by combining the main state with secondary states. 
The states are separated by ":".

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "lca1-app0004.stg.linkedin.com_11932":"{MasterSlave : MASTER} : {VERSION 
: 1.0.1}"
      ,"lca1-app0048.stg.linkedin.com_11932":"{MasterSlave : SLAVE} : {VERSION 
: 1.0.0}"
    }
  }
}

2. Adding new fields for showing secondary states separately.

*Secondary state in External View*

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF_REF":"MasterSlave"
    ,"ASSOCIATE_STATE_MODEL_DEF_REFS" : "{VERSION : VersionStateModelFactory}"
  }
  ,"listFields":{
  }
  ,"mapFields":{
    "example_resource_0":{
      "lca1-app0004.stg.linkedin.com_11932":"MASTER"
      ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE"
      ,"lca1-app0048.stg.linkedin.com_11932_SECONDARY_STATE":"{VERSION : 1.0.0}"
      ,"lca1-app0048.stg.linkedin.com_11932_SECONDARY_STATE":"{VERSION : 1.0.0}"
    }
  }
}

Actually, both options have backward compatible issues. The first design will 
change state string, so the legacy client won't be able to interpret. The 
second design will increase map fields items. So the applications that read 
this map for all partitions will find additional partitions. And the names are 
incorrect.
Comparing these 2 options, the first one fit our long turn goals much better. 
So it is our choice for phase one.
As for the backward compatible issue, we plan to create an additional external 
view ZK node for holding new format. And the old external view node will be 
kept the same.

h3. State Transition Message

On multiple states change, the messages are sent in order according to 
priority. There won't be parallel state transition on one partition.

h3. Helix Controller Updates

When resource configuration is changed:

* Fill ClusterDataCache with secondary states and state models/factories.
* Compare for status delta and compose messages accordingly. Order messages 
according to state model priority.
* Send the highest priority message to the participant.

One optimization opportunity is allowing parallel state transition messages if 
there is no conflict.

When participant current state is changed:

* Read secondary states and fill new external view ZK node with encoded 
complete status information.

h3. Helix Participant Updates

On receiving state transition message:

* Check if the message is a registered state model. Trigger state transition.
*   If any state transition failed, set an error state and stop processing. The 
user should fix the problem and reset to initial state.
*   If state transition succeeds, update the current state.

h2. Alternative Options for Supporting Additional States

h3. Introducing special state for additional status change

Adding a new internal state UPGRADING (or other special states) for status 
change.
So any additional status change will happen when a partition is transited "to" 
or "from" UPGRADING state.
Note that application has the freedom to define whether UPGRADING is a special 
online status or not.This is for decoupling the main state from additional 
"states".
For Pinot case, upgrading partition (even before they are back to ONLINE) might 
be active partition.

The problem of this new state is that it only works fine for a single 
additional state model.
Once we have more than one state models to take care, and they are changed 
separately, UPGRADING state is not enough.

h3. Rely on resetting partition to load new "states"

Whenever new states are going to be set, application updates resource 
configuration. Then resetting all partitions.
Then during state transition from offline to online, participants will read new 
states from the configuration and apply to the related partitions.

The problem is that changing in additional states will affect the main state. 
The partition will be offline for a while.

h3. Application registers additional message handler for customized transition 
message

In this method, application owns the logic. Helix just dispatches customized 
state transition message to trigger the operation. In the message handler, the 
application read and write the information of the additional state to the 
property store.

Consider additional states is a generic requirement, letting multiple 
applications to implement similar logic separately does not make sense.

> Extend Helix to Support Resource with Multiple States
> -----------------------------------------------------
>
>                 Key: HELIX-659
>                 URL: https://issues.apache.org/jira/browse/HELIX-659
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>
> h1. Problem Statement
> h2. Single State Model v.s. Multiple State Models
> Currently, Each Helix resource is associated with a single state model, and 
> each replica of a partition can only be in any one of these states defined in 
> the state model at any time. And Helix manages state transition based on the 
> single state model.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2416&x=-11&y=71&w=517&h=198&store=1&accept=image%2F*&auth=LCA%20313ced8fb855e8fc1a7043f7fe91cdfa15fffb6b-ts%3D1498857664!
> However, in many scenarios, resources could be more complicated to be modeled 
> by a single state model.
> As an example, partitions from a resource could be described in different 
> dimensions: SlaveMaster state, Read or Write state and its versions. They 
> represent different dimensions of the overall resource status. States from 
> each dimension are based on different state models. Note that we have state 
> machines simplified in this document.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2416&x=-71&y=66&w=1822&h=308&store=1&accept=image%2F*&auth=LCA%2041fa743ba130f41786dee3527de6206cebdd4534-ts%3D1498857664!
> The basic idea is that states in these 3 dimensions are in parallel and can 
> be changed independently. For instance, R/W state may be changed without 
> updating slave/master state.
> h2. Finite State Machine v.s. Dynamic State Model
> In addition, Helix employs finite state machine to define a state model. 
> However, some state model can not be easily modeled by a finite state machine 
> with fixed states, for example, the versions.  We call such state model as 
> the dynamic state model. It is read, set, and understood by the application. 
> We will need to extend Helix to support such dynamic state model. Note that 
> Helix should not and will not be able to calculate the best possible dynamic 
> states.
> The version of a software is one of the best examples to understand dynamic 
> state.
> Let's consider one application that is deployed on multiple nodes, which work 
> together as a cluster. The green node works as the master, and all dark blue 
> nodes are slaves. When Admins upgrades the service from 1.0.0 to 1.1.0, they 
> need to ensure upgrading all nodes to the new version and then claim upgrade 
> is done. After the upgrade process, it is important to ensure that all 
> software versions are consistent.
> If Helix framework is leveraged to support upgrading the cluster, it will 
> help to simplify application logic and ensure consistency. For instance, the 
> service (cluster) itself is regarded as the resource. And each node is mapped 
> as a partition. Then upgrading is simply a state transition. Admins can check 
> external view for ensuring consistency.
> Note that during this version upgrade, the master node is still master node, 
> and slave nodes are still slave nodes. So the version state is parallel to 
> the other states.
> !https://documents.lucidchart.com/documents/e19ab04e-aa06-4ab3-9e57-cfe273554fa1/pages/0_0?a=2066&x=1466&y=922&w=560&h=455&store=1&accept=image%2F*&auth=LCA%20fa3d8fc0d113a82f4e94b127161cf91818a2fe64-ts%3D1497894598!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to