[ 
https://issues.apache.org/jira/browse/HELIX-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030755#comment-16030755
 ] 

Jiajun Wang edited comment on HELIX-659 at 5/31/17 8:00 AM:
------------------------------------------------------------

h1. Proposal
In this document, we propose to introduce an additional layer of state 
mechanism into Helix.
Considering Pinot case, what they need is transiting from "ONLINE:V1" to 
"ONLINE:V2". Note that "V1" to "V2" transition is in parallel of the existing 
state transition. It is special in following ways:
# The state is not pre-defined. New version numbers may appear after state 
transition model is registered.
# Helix won't understand the internal logic of this additional state. So there 
is no way that Helix automatically computes idea state. It will rely on 
application's configuration to update this state.

We will take the above 2 points as assumptions.

As for expected workflow, still take Pinot partition version as an example: 
# Pinot needs to register their own logic for version upgrade, which means a 
new state model (factory name).
# Helix provides API to configure resources with additional state ("VERSION").
# Upon resource configuration changed, the controller triggers state transition 
and sends message to the participants.
# Participants handles message by calling corresponding state transition 
methods. Then update in current state.
# Controller listens on current state change. If any update, it processes and 
reflects the update in the external view.
 
h1. Design
h2. Register Associate States Model / Factory
Note that since associate states maybe not pre-defined, so 
defaultTransitionHandler has to be implemented.
h3. State Model Factory:

public abstract class AssociateStateModelFactory extends 
StateModelFactory<AssociateStateModel> {
  ...
}
  
public abstract class AssociateStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext 
context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no 
transition method is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + 
error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer 
own this resource or session timedout");
  }
 
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext 
context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration
h3. Resource config with associate state VERSION:

{
  "id":"Test_Resource"
  ,"simpleFields":{
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ],
    "ASSOCIATE_STATES": [
        "1.0.1"
    ],
  }
  ,"mapFields":{
  }
}

h2. Additional APIs to configure associate states

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> 
listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys);

h2. Partition with the Associate States on the Participant State And EV
h3. Current States:

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      "CURRENT_STATE":"MASTER"
      "ASSOCIATE_STATES":"1.0.1" // Split by ":" if multiple associate states 
are set
      ,"INFO":""
    }
  }
}

h3. Associate state in External View:

{
  "id":"example_resource"
  ,"simpleFields":{
    ,"STATE_MODEL_DEF_REF":"MasterSlave"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      // Given more than one assistant states, they will be split by ":". And 
the main state will always be the first state.
      "lca1-app0004.stg.linkedin.com_11932":"MASTER:1.0.1"
      ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE:1.0.0"
    }
  }
}

h2. Helix Controller Updates
On resource configuration changes:
* Fill ClusterDataCache with associate states and related state models / 
factories from resource configuration.
* Merge associate states to BestPossibleStateOutput.
* Fill associate states and related state models / factories into the message 
before sending to participants.

Note that batching all concurrent states change in one message can help to 
avoid parallel state transitions. And if any error happens,  the processing 
will be stopped immediately, so as to avoid further issue. This also means 
participants should handle multiple state transitions sequentially.
An alternative design is sending separate messages on any of the states' 
change. This design implies that states have no dependency. And there is no 
guarantee that the main state will be handled before other associate states. It 
might be helpful in some conditions. But overall, this alternative design 
brings more risk than benefit.

On participant state changes:
* Besides existing read, also read and fill associate states. Then fill EV with 
complete states information.

h2. Helix Participant Updates
On receiving state transition message:
* Read main state and associate states, trigger state transitions in order.
* Do main state transition first, then do associate states transitions one by 
one.
** If any state transition failed, set an error state to cover all states and 
stop processing. User should fix problem and reset to initial states.
** If state transition succeeds, update current state.

h1. Alternative options
h2. Introducing UPGRADING State for additional state transitions
Adding a new internal state UPGRADING for partition upgrade.
So upgrade will happen when the partition is transited "to" or "from" UPGRADING 
status.
Note that application has the freedom to define whether UPGRADING is a special 
online status or not.
For Pinot case, upgrading partition (even before they are back to ONLINE) might 
be active partition.
The problem of this new state is that it only works fine for a single 
additional state.
Once we have more than one additional state to take care, UPGRADING state is 
not enough.
h2. Rely on resetting partition to load new states
Whenever a new version is available, application update versions for the 
resource. Then resetting all partitions.
Then during state transition from offline to online, participants will read new 
version and apply to the related partitions.
The problem of this method is changing in the additional state will affect the 
main state. A partition will be offline for a while. During this period, even 
old version will be not available.
h2. Application registers message handler to handle upgrading message
In this method, the controller is only responsible for sending upgrade request 
to participants. Participants will be responsible for reporting local 
participant versions.
Since the controller has no clue about how to control the additional state, the 
application will need to process all the logics.
h1. Validation
Add unit tests / integration tests for validate associate states.
Verify Pinot Version use case.


was (Author: jiajunwang):
h1. Proposal
In this document, we propose to introduce an additional layer of state 
mechanism into Helix.
Considering Pinot case, what they need is transiting from "ONLINE:V1" to 
"ONLINE:V2". Note that "V1" to "V2" transition is in parallel of the existing 
state transition. It is special in following ways:
# The state is not pre-defined. New version numbers may appear after state 
transition model is registered.
# Helix won't understand the internal logic of this additional state. So there 
is no way that Helix automatically computes idea state. It will rely on 
application's configuration to update this state.

We will take the above 2 points as assumptions.

As for expected workflow, still take Pinot partition version as an example: 
# Pinot needs to register their own logic for version upgrade, which means a 
new state model (factory name).
# Helix provides API to configure resources with additional state ("VERSION").
# Upon resource configuration changed, the controller triggers state transition 
and sends message to the participants.
# Participants handles message by calling corresponding state transition 
methods. Then update in current state.
# Controller listens on current state change. If any update, it processes and 
reflects the update in the external view.
 
h1. Design
h2. Register Associate States Model / Factory
Note that since associate states maybe not pre-defined, so 
defaultTransitionHandler has to be implemented.
h3. State Model Factory:

public abstract class AssociateStateModelFactory extends 
StateModelFactory<AssociateStateModel> {
  ...
}
  
public abstract class AssociateStateModel extends StateModel {
  static final String DEFAULT_INITIAL_STATE = "UNKNOWN";
  protected String _currentState = DEFAULT_INITIAL_STATE;
 
  public String getCurrentState() {
    return _currentState;
  }
 
  // !!!!!!!!!!! Changed part !!!!!!!!!!!! //
  @transition(from='from', to='to')
  public void defaultTransitionHandler(Message message, NotificationContext 
context) {
    logger
      .error("Default transition handler. The idea is to invoke this if no 
transition method is found. To be implemented");
  }
 
  public boolean updateState(String newState) {
    _currentState = newState;
    return true;
  }
 
  public void rollbackOnError(Message message, NotificationContext context,
      StateTransitionError error) {
    logger.error("Default rollback method invoked on error. Error Code: " + 
error.getCode());
  }
 
  public void reset() {
    logger
      .warn("Default reset method invoked. Either because the process longer 
own this resource or session timedout");
  }
 
  @Transition(to = "DROPPED", from = "ERROR")
  public void onBecomeDroppedFromError(Message message, NotificationContext 
context)
      throws Exception {
    logger.info("Default ERROR->DROPPED transition invoked.");
  }
}

h2. Resource Configuration
h3. Resource config with associate state VERSION:

{
  "id":"Test_Resource"
  ,"simpleFields":{
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ],
    "ASSOCIATE_STATES": [
        "1.0.1"
    ],
  }
  ,"mapFields":{
  }
}

h2. Additional APIs to configure associate states

 /**
 * Set configuration values
 * @param scope
 * @param properties
 */
void setConfig(HelixConfigScope scope, Map<String, List<String>> 
listProperties);
  
/**
 * Get configuration values
 * @param scope
 * @param keys
 * @return configuration values ordered by the provided keys
 */
Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys);

h2. Partition with the Associate States on the Participant State And EV
h3. Current States:

{
  "id":"example_resource"
  ,"simpleFields":{
    "STATE_MODEL_DEF":"MasterSlave"
    ,"STATE_MODEL_FACTORY_NAME":"DEFAULT"
    ,"BUCKET_SIZE":"0"
    ,"SESSION_ID":"25b2ce5dfbde0fa"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ],
    "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [
        "DEFAULT"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      "CURRENT_STATE":"MASTER"
      "ASSOCIATE_STATES":"1.0.1" // Split by ":" if multiple associate states 
are set
      ,"INFO":""
    }
  }
}

h3. Associate state in External View:

{
  "id":"example_resource"
  ,"simpleFields":{
    ,"STATE_MODEL_DEF_REF":"MasterSlave"
  }
  ,"listFields":{
    "ASSOCIATE_STATE_MODEL_DEF_REFS": [
        "VERSION"
    ]
  }
  ,"mapFields":{
    "example_resource_0":{
      // Given more than one assistant states, they will be split by ":". And 
the main state will always be the first state.
      "lca1-app0004.stg.linkedin.com_11932":"MASTER:1.0.1"
      ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE:1.0.0"
    }
  }
}

h2. Helix Controller Updates
On resource configuration changes:
* Fill ClusterDataCache with associate states and related state models / 
factories from resource configuration.
* Merge associate states to BestPossibleStateOutput.
* Fill associate states and related state models / factories into the message 
before sending to participants.
On participant state changes:
* Besides existing read, also read and fill associate states. Then fill EV with 
complete states information.

h2. Helix Participant Updates
On receiving state transition message:
* Read main state and associate states, trigger state transitions in order.
* Do main state transition first, then do associate states transitions one by 
one.
** If any state transition failed, set an error state to cover all states and 
stop processing. User should fix problem and reset to initial states.
** If state transition succeeds, update current state.

h1. Alternative options
h2. Introducing UPGRADING State for additional state transitions
Adding a new internal state UPGRADING for partition upgrade.
So upgrade will happen when the partition is transited "to" or "from" UPGRADING 
status.
Note that application has the freedom to define whether UPGRADING is a special 
online status or not.
For Pinot case, upgrading partition (even before they are back to ONLINE) might 
be active partition.
The problem of this new state is that it only works fine for a single 
additional state.
Once we have more than one additional state to take care, UPGRADING state is 
not enough.
h2. Rely on resetting partition to load new states
Whenever a new version is available, application update versions for the 
resource. Then resetting all partitions.
Then during state transition from offline to online, participants will read new 
version and apply to the related partitions.
The problem of this method is changing in the additional state will affect the 
main state. A partition will be offline for a while. During this period, even 
old version will be not available.
h2. Application registers message handler to handle upgrading message
In this method, the controller is only responsible for sending upgrade request 
to participants. Participants will be responsible for reporting local 
participant versions.
Since the controller has no clue about how to control the additional state, the 
application will need to process all the logics.
h1. Validation
Add unit tests / integration tests for validate associate states.
Verify Pinot Version use case.

> Support Additional Associate States
> -----------------------------------
>
>                 Key: HELIX-659
>                 URL: https://issues.apache.org/jira/browse/HELIX-659
>             Project: Apache Helix
>          Issue Type: New Feature
>          Components: helix-core
>    Affects Versions: 0.6.x
>            Reporter: Jiajun Wang
>         Attachments: Associate States Processing Flow.pdf
>
>
> Currently, Helix only supports management a single state for all 
> resources/partitions. However, in the real world, cluster management 
> requirements may be more complicated than that.
> In Pinot, for example, each partition need to be assigned a version for 
> ensuring data consistency.
> When a new version comes, the system needs to replace the old partition with 
> the new one. And the replacement is done one partition by one partition. So 
> any reads during this period will get inconsistent data.
> Pinot system cannot directly put the version information into the 
> section(partition) state field because it is already occupied by the main 
> state (offline-online for instance) used by Helix controller.
> So Pinot team relies on some workarounds to implement their application 
> logic: creating a new resource with the latest version and replace them after 
> the resource is fully loaded. And for Helix controller, version is unknown.
> Another option is Pinot team maintaining their own config item or property 
> store item for recording versions.
> Both ways require Pinot team implementing version control themselves.
> Another requirement is from Ambry team. Where partition can be "ONLINE:READ" 
> or "ONLINE:WRITE".
> In both cases, single state mechanism is not sufficient for applications' 
> requirement.
> It would be very helpful to provide a framework level feature that supports 
> more than one states for each partition.
> Benefits: 
> # The application doesn't need to write additional code for managing 
> additional states.
> # Avoid potential conflict when multiple states transition happens 
> concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to