[ https://issues.apache.org/jira/browse/HELIX-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030755#comment-16030755 ]
Jiajun Wang edited comment on HELIX-659 at 5/31/17 8:00 AM: ------------------------------------------------------------ h1. Proposal In this document, we propose to introduce an additional layer of state mechanism into Helix. Considering Pinot case, what they need is transiting from "ONLINE:V1" to "ONLINE:V2". Note that "V1" to "V2" transition is in parallel of the existing state transition. It is special in following ways: # The state is not pre-defined. New version numbers may appear after state transition model is registered. # Helix won't understand the internal logic of this additional state. So there is no way that Helix automatically computes idea state. It will rely on application's configuration to update this state. We will take the above 2 points as assumptions. As for expected workflow, still take Pinot partition version as an example: # Pinot needs to register their own logic for version upgrade, which means a new state model (factory name). # Helix provides API to configure resources with additional state ("VERSION"). # Upon resource configuration changed, the controller triggers state transition and sends message to the participants. # Participants handles message by calling corresponding state transition methods. Then update in current state. # Controller listens on current state change. If any update, it processes and reflects the update in the external view. h1. Design h2. Register Associate States Model / Factory Note that since associate states maybe not pre-defined, so defaultTransitionHandler has to be implemented. h3. State Model Factory: public abstract class AssociateStateModelFactory extends StateModelFactory<AssociateStateModel> { ... } public abstract class AssociateStateModel extends StateModel { static final String DEFAULT_INITIAL_STATE = "UNKNOWN"; protected String _currentState = DEFAULT_INITIAL_STATE; public String getCurrentState() { return _currentState; } // !!!!!!!!!!! Changed part !!!!!!!!!!!! // @transition(from='from', to='to') public void defaultTransitionHandler(Message message, NotificationContext context) { logger .error("Default transition handler. The idea is to invoke this if no transition method is found. To be implemented"); } public boolean updateState(String newState) { _currentState = newState; return true; } public void rollbackOnError(Message message, NotificationContext context, StateTransitionError error) { logger.error("Default rollback method invoked on error. Error Code: " + error.getCode()); } public void reset() { logger .warn("Default reset method invoked. Either because the process longer own this resource or session timedout"); } @Transition(to = "DROPPED", from = "ERROR") public void onBecomeDroppedFromError(Message message, NotificationContext context) throws Exception { logger.info("Default ERROR->DROPPED transition invoked."); } } h2. Resource Configuration h3. Resource config with associate state VERSION: { "id":"Test_Resource" ,"simpleFields":{ } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ], "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [ "DEFAULT" ], "ASSOCIATE_STATES": [ "1.0.1" ], } ,"mapFields":{ } } h2. Additional APIs to configure associate states /** * Set configuration values * @param scope * @param properties */ void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties); /** * Get configuration values * @param scope * @param keys * @return configuration values ordered by the provided keys */ Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys); h2. Partition with the Associate States on the Participant State And EV h3. Current States: { "id":"example_resource" ,"simpleFields":{ "STATE_MODEL_DEF":"MasterSlave" ,"STATE_MODEL_FACTORY_NAME":"DEFAULT" ,"BUCKET_SIZE":"0" ,"SESSION_ID":"25b2ce5dfbde0fa" } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ], "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [ "DEFAULT" ] } ,"mapFields":{ "example_resource_0":{ "CURRENT_STATE":"MASTER" "ASSOCIATE_STATES":"1.0.1" // Split by ":" if multiple associate states are set ,"INFO":"" } } } h3. Associate state in External View: { "id":"example_resource" ,"simpleFields":{ ,"STATE_MODEL_DEF_REF":"MasterSlave" } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ] } ,"mapFields":{ "example_resource_0":{ // Given more than one assistant states, they will be split by ":". And the main state will always be the first state. "lca1-app0004.stg.linkedin.com_11932":"MASTER:1.0.1" ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE:1.0.0" } } } h2. Helix Controller Updates On resource configuration changes: * Fill ClusterDataCache with associate states and related state models / factories from resource configuration. * Merge associate states to BestPossibleStateOutput. * Fill associate states and related state models / factories into the message before sending to participants. Note that batching all concurrent states change in one message can help to avoid parallel state transitions. And if any error happens, the processing will be stopped immediately, so as to avoid further issue. This also means participants should handle multiple state transitions sequentially. An alternative design is sending separate messages on any of the states' change. This design implies that states have no dependency. And there is no guarantee that the main state will be handled before other associate states. It might be helpful in some conditions. But overall, this alternative design brings more risk than benefit. On participant state changes: * Besides existing read, also read and fill associate states. Then fill EV with complete states information. h2. Helix Participant Updates On receiving state transition message: * Read main state and associate states, trigger state transitions in order. * Do main state transition first, then do associate states transitions one by one. ** If any state transition failed, set an error state to cover all states and stop processing. User should fix problem and reset to initial states. ** If state transition succeeds, update current state. h1. Alternative options h2. Introducing UPGRADING State for additional state transitions Adding a new internal state UPGRADING for partition upgrade. So upgrade will happen when the partition is transited "to" or "from" UPGRADING status. Note that application has the freedom to define whether UPGRADING is a special online status or not. For Pinot case, upgrading partition (even before they are back to ONLINE) might be active partition. The problem of this new state is that it only works fine for a single additional state. Once we have more than one additional state to take care, UPGRADING state is not enough. h2. Rely on resetting partition to load new states Whenever a new version is available, application update versions for the resource. Then resetting all partitions. Then during state transition from offline to online, participants will read new version and apply to the related partitions. The problem of this method is changing in the additional state will affect the main state. A partition will be offline for a while. During this period, even old version will be not available. h2. Application registers message handler to handle upgrading message In this method, the controller is only responsible for sending upgrade request to participants. Participants will be responsible for reporting local participant versions. Since the controller has no clue about how to control the additional state, the application will need to process all the logics. h1. Validation Add unit tests / integration tests for validate associate states. Verify Pinot Version use case. was (Author: jiajunwang): h1. Proposal In this document, we propose to introduce an additional layer of state mechanism into Helix. Considering Pinot case, what they need is transiting from "ONLINE:V1" to "ONLINE:V2". Note that "V1" to "V2" transition is in parallel of the existing state transition. It is special in following ways: # The state is not pre-defined. New version numbers may appear after state transition model is registered. # Helix won't understand the internal logic of this additional state. So there is no way that Helix automatically computes idea state. It will rely on application's configuration to update this state. We will take the above 2 points as assumptions. As for expected workflow, still take Pinot partition version as an example: # Pinot needs to register their own logic for version upgrade, which means a new state model (factory name). # Helix provides API to configure resources with additional state ("VERSION"). # Upon resource configuration changed, the controller triggers state transition and sends message to the participants. # Participants handles message by calling corresponding state transition methods. Then update in current state. # Controller listens on current state change. If any update, it processes and reflects the update in the external view. h1. Design h2. Register Associate States Model / Factory Note that since associate states maybe not pre-defined, so defaultTransitionHandler has to be implemented. h3. State Model Factory: public abstract class AssociateStateModelFactory extends StateModelFactory<AssociateStateModel> { ... } public abstract class AssociateStateModel extends StateModel { static final String DEFAULT_INITIAL_STATE = "UNKNOWN"; protected String _currentState = DEFAULT_INITIAL_STATE; public String getCurrentState() { return _currentState; } // !!!!!!!!!!! Changed part !!!!!!!!!!!! // @transition(from='from', to='to') public void defaultTransitionHandler(Message message, NotificationContext context) { logger .error("Default transition handler. The idea is to invoke this if no transition method is found. To be implemented"); } public boolean updateState(String newState) { _currentState = newState; return true; } public void rollbackOnError(Message message, NotificationContext context, StateTransitionError error) { logger.error("Default rollback method invoked on error. Error Code: " + error.getCode()); } public void reset() { logger .warn("Default reset method invoked. Either because the process longer own this resource or session timedout"); } @Transition(to = "DROPPED", from = "ERROR") public void onBecomeDroppedFromError(Message message, NotificationContext context) throws Exception { logger.info("Default ERROR->DROPPED transition invoked."); } } h2. Resource Configuration h3. Resource config with associate state VERSION: { "id":"Test_Resource" ,"simpleFields":{ } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ], "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [ "DEFAULT" ], "ASSOCIATE_STATES": [ "1.0.1" ], } ,"mapFields":{ } } h2. Additional APIs to configure associate states /** * Set configuration values * @param scope * @param properties */ void setConfig(HelixConfigScope scope, Map<String, List<String>> listProperties); /** * Get configuration values * @param scope * @param keys * @return configuration values ordered by the provided keys */ Map<String, List<String>> getConfig(HelixConfigScope scope, List<String> keys); h2. Partition with the Associate States on the Participant State And EV h3. Current States: { "id":"example_resource" ,"simpleFields":{ "STATE_MODEL_DEF":"MasterSlave" ,"STATE_MODEL_FACTORY_NAME":"DEFAULT" ,"BUCKET_SIZE":"0" ,"SESSION_ID":"25b2ce5dfbde0fa" } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ], "ASSOCIATE_STATE_MODEL_FACTORY_NAMES": [ "DEFAULT" ] } ,"mapFields":{ "example_resource_0":{ "CURRENT_STATE":"MASTER" "ASSOCIATE_STATES":"1.0.1" // Split by ":" if multiple associate states are set ,"INFO":"" } } } h3. Associate state in External View: { "id":"example_resource" ,"simpleFields":{ ,"STATE_MODEL_DEF_REF":"MasterSlave" } ,"listFields":{ "ASSOCIATE_STATE_MODEL_DEF_REFS": [ "VERSION" ] } ,"mapFields":{ "example_resource_0":{ // Given more than one assistant states, they will be split by ":". And the main state will always be the first state. "lca1-app0004.stg.linkedin.com_11932":"MASTER:1.0.1" ,"lca1-app0048.stg.linkedin.com_11932":"SLAVE:1.0.0" } } } h2. Helix Controller Updates On resource configuration changes: * Fill ClusterDataCache with associate states and related state models / factories from resource configuration. * Merge associate states to BestPossibleStateOutput. * Fill associate states and related state models / factories into the message before sending to participants. On participant state changes: * Besides existing read, also read and fill associate states. Then fill EV with complete states information. h2. Helix Participant Updates On receiving state transition message: * Read main state and associate states, trigger state transitions in order. * Do main state transition first, then do associate states transitions one by one. ** If any state transition failed, set an error state to cover all states and stop processing. User should fix problem and reset to initial states. ** If state transition succeeds, update current state. h1. Alternative options h2. Introducing UPGRADING State for additional state transitions Adding a new internal state UPGRADING for partition upgrade. So upgrade will happen when the partition is transited "to" or "from" UPGRADING status. Note that application has the freedom to define whether UPGRADING is a special online status or not. For Pinot case, upgrading partition (even before they are back to ONLINE) might be active partition. The problem of this new state is that it only works fine for a single additional state. Once we have more than one additional state to take care, UPGRADING state is not enough. h2. Rely on resetting partition to load new states Whenever a new version is available, application update versions for the resource. Then resetting all partitions. Then during state transition from offline to online, participants will read new version and apply to the related partitions. The problem of this method is changing in the additional state will affect the main state. A partition will be offline for a while. During this period, even old version will be not available. h2. Application registers message handler to handle upgrading message In this method, the controller is only responsible for sending upgrade request to participants. Participants will be responsible for reporting local participant versions. Since the controller has no clue about how to control the additional state, the application will need to process all the logics. h1. Validation Add unit tests / integration tests for validate associate states. Verify Pinot Version use case. > Support Additional Associate States > ----------------------------------- > > Key: HELIX-659 > URL: https://issues.apache.org/jira/browse/HELIX-659 > Project: Apache Helix > Issue Type: New Feature > Components: helix-core > Affects Versions: 0.6.x > Reporter: Jiajun Wang > Attachments: Associate States Processing Flow.pdf > > > Currently, Helix only supports management a single state for all > resources/partitions. However, in the real world, cluster management > requirements may be more complicated than that. > In Pinot, for example, each partition need to be assigned a version for > ensuring data consistency. > When a new version comes, the system needs to replace the old partition with > the new one. And the replacement is done one partition by one partition. So > any reads during this period will get inconsistent data. > Pinot system cannot directly put the version information into the > section(partition) state field because it is already occupied by the main > state (offline-online for instance) used by Helix controller. > So Pinot team relies on some workarounds to implement their application > logic: creating a new resource with the latest version and replace them after > the resource is fully loaded. And for Helix controller, version is unknown. > Another option is Pinot team maintaining their own config item or property > store item for recording versions. > Both ways require Pinot team implementing version control themselves. > Another requirement is from Ambry team. Where partition can be "ONLINE:READ" > or "ONLINE:WRITE". > In both cases, single state mechanism is not sufficient for applications' > requirement. > It would be very helpful to provide a framework level feature that supports > more than one states for each partition. > Benefits: > # The application doesn't need to write additional code for managing > additional states. > # Avoid potential conflict when multiple states transition happens > concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)