[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

ASF GitHub Bot (Jira) Thu, 05 Sep 2019 03:50:55 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-1982?focusedWorklogId=307061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-307061
 ]


ASF GitHub Bot logged work on HDDS-1982:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Sep/19 10:49
            Start Date: 05/Sep/19 10:49
    Worklog Time Spent: 10m 
      Work Description: sodonnel commented on pull request #1344: HDDS-1982 
Extend SCMNodeManager to support decommission and maintenance states
URL: https://github.com/apache/hadoop/pull/1344#discussion_r321192741
 
 

 ##########
 File path: 
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/NodeStateManager.java
 ##########
 @@ -219,47 +221,51 @@ private void initialiseState2EventMap() {
    *  |   |                          |                         |
    *  V   V                          |                         |
    * [HEALTHY]------------------->[STALE]------------------->[DEAD]
-   *    |         (TIMEOUT)          |         (TIMEOUT)       |
-   *    |                            |                         |
-   *    |                            |                         |
-   *    |                            |                         |
-   *    |                            |                         |
-   *    | (DECOMMISSION)             | (DECOMMISSION)          | (DECOMMISSION)
-   *    |                            V                         |
-   *    +------------------->[DECOMMISSIONING]<----------------+
-   *                                 |
-   *                                 | (DECOMMISSIONED)
-   *                                 |
-   *                                 V
-   *                          [DECOMMISSIONED]
    *
    */
 
   /**
    * Initializes the lifecycle of node state machine.
    */
-  private void initializeStateMachine() {
-    stateMachine.addTransition(
+  private void initializeStateMachines() {
+    nodeHealthSM.addTransition(
         NodeState.HEALTHY, NodeState.STALE, NodeLifeCycleEvent.TIMEOUT);
-    stateMachine.addTransition(
+    nodeHealthSM.addTransition(
         NodeState.STALE, NodeState.DEAD, NodeLifeCycleEvent.TIMEOUT);
-    stateMachine.addTransition(
+    nodeHealthSM.addTransition(
         NodeState.STALE, NodeState.HEALTHY, NodeLifeCycleEvent.RESTORE);
-    stateMachine.addTransition(
+    nodeHealthSM.addTransition(
         NodeState.DEAD, NodeState.HEALTHY, NodeLifeCycleEvent.RESURRECT);
-    stateMachine.addTransition(
-        NodeState.HEALTHY, NodeState.DECOMMISSIONING,
-        NodeLifeCycleEvent.DECOMMISSION);
-    stateMachine.addTransition(
-        NodeState.STALE, NodeState.DECOMMISSIONING,
-        NodeLifeCycleEvent.DECOMMISSION);
-    stateMachine.addTransition(
-        NodeState.DEAD, NodeState.DECOMMISSIONING,
-        NodeLifeCycleEvent.DECOMMISSION);
-    stateMachine.addTransition(
-        NodeState.DECOMMISSIONING, NodeState.DECOMMISSIONED,
-        NodeLifeCycleEvent.DECOMMISSIONED);
 
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.IN_SERVICE, NodeOperationalState.DECOMMISSIONING,
+        NodeOperationStateEvent.START_DECOMMISSION);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.DECOMMISSIONING, NodeOperationalState.IN_SERVICE,
+        NodeOperationStateEvent.RETURN_TO_SERVICE);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.DECOMMISSIONING,
+        NodeOperationalState.DECOMMISSIONED,
+        NodeOperationStateEvent.COMPLETE_DECOMMISSION);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.DECOMMISSIONED, NodeOperationalState.IN_SERVICE,
+        NodeOperationStateEvent.RETURN_TO_SERVICE);
+
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.IN_SERVICE,
+        NodeOperationalState.ENTERING_MAINTENANCE,
+        NodeOperationStateEvent.START_MAINTENANCE);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.ENTERING_MAINTENANCE,
+        NodeOperationalState.IN_SERVICE,
+        NodeOperationStateEvent.RETURN_TO_SERVICE);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.ENTERING_MAINTENANCE,
+        NodeOperationalState.IN_MAINTENANCE,
+        NodeOperationStateEvent.ENTER_MAINTENANCE);
+    nodeOpStateSM.addTransition(
+        NodeOperationalState.IN_MAINTENANCE, NodeOperationalState.IN_SERVICE,
+        NodeOperationStateEvent.RETURN_TO_SERVICE);
 
 Review comment:
   I hadn't considered where to store that as yet. Probably it will be outside 
of the state machine, but need to consider where it fits in. Perhaps in 
NodeStatus, but that would change that object from being immutable, to carrying 
a time. 
   
   We will need some sort of decommission / maintenance mode monitor, probably 
separate from the heartbeat monitor. The decomm monitor will need to check when 
all blocks are replicated etc, so it could also keep track of the node 
maintenance timeout and hence switch the node to 'IN_SERVICE + DEAD" if it is 
dead and the timeout expires.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 307061)
    Time Spent: 1h 40m  (was: 1.5h)

> Extend SCMNodeManager to support decommission and maintenance states
> --------------------------------------------------------------------
>
>                 Key: HDDS-1982
>                 URL: https://issues.apache.org/jira/browse/HDDS-1982
>             Project: Hadoop Distributed Data Store
>          Issue Type: Sub-task
>          Components: SCM
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, within SCM a node can have the following states:
> HEALTHY
> STALE
> DEAD
> DECOMMISSIONING
> DECOMMISSIONED
> The last 2 are not currently used.
> In order to support decommissioning and maintenance mode, we need to extend 
> the set of states a node can have to include decommission and maintenance 
> states.
> It is also important to note that a node decommissioning or entering 
> maintenance can also be HEALTHY, STALE or go DEAD.
> Therefore in this Jira I propose we should model a node state with two 
> different sets of values. The first, is effectively the liveliness of the 
> node, with the following states. This is largely what is in place now:
> HEALTHY
> STALE
> DEAD
> The second is the node operational state:
> IN_SERVICE
> DECOMMISSIONING
> DECOMMISSIONED
> ENTERING_MAINTENANCE
> IN_MAINTENANCE
> That means the overall total number of states for a node is the cross-product 
> of the two above lists, however it probably makes sense to keep the two 
> states seperate internally.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDDS-1982) Extend SCMNodeManager to support decommission and maintenance states

Reply via email to