[GitHub] helix pull request #119: Giant PR for Helix 0.8

dasahcc Tue, 07 Nov 2017 18:20:20 -0800

GitHub user dasahcc opened a pull request:

    https://github.com/apache/helix/pull/119


    Giant PR for Helix 0.8

    Giant PR for Helix 0.8. It includes multiple features and new Helix UI / 
Helix REST. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dasahcc/helix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/119.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #119
    
----
commit 310457c2462345127f6a1e5e133bd8d19c4e5482
Author: Junkai Xue <[email protected]>
Date:   2017-01-06T02:14:02Z

    State Transition Cancellation Client Implementation
    
    State transition takes a vital part of Helix managing clusters. There are 
different reasons can cause state transition is not necessary, for example, the 
node of partition running state transition is down. Thus state transition 
cancellation would be a useful feature to have in Helix. It not only helps 
cancel the state transition to avoid invalid state but also benefits for 
reducing redundant state transitions.
    
    For details, please refer : 
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
    
    In this implementation following are included:
    1. Add new StateTransitionCancellationHandler
    2. Implement cancel logic of message received not handled, message handled 
task not started and message handled task started.
    3. Add default implementation of cancel method in StateModel
    4. Add new STATE_TRANSITION_CANCELLATION message type
    5. Unit test for cancel logic

commit 4fee01cdeffbc3fcd07a6aa06c4fe4bf61c0c811
Author: Lei Xia <[email protected]>
Date:   2017-01-31T01:52:24Z

    Add missing files for helix-rest module.

commit 29817cfc26571474088656f7491d9a8069785a75
Author: Junkai Xue <[email protected]>
Date:   2017-01-26T01:15:04Z

    Support cancel tasks with synchronized check task status
    
    Currently, in Helix, cancel and stop a job does not check subtasks status. 
In this rb:
    1. Add new API to support sync stopping a workflow/queue
    2. Controller side check subtasks are stopped before mark job status.

commit 0bb7998037908a31e287c3a8a0a10eb08514ca61
Author: Junkai Xue <[email protected]>
Date:   2017-02-02T04:24:04Z

    Revert "Enable maxRecipient in Criteria"
    
    This reverts commit 0a0c7dcf7a95e7c1050fa69bcb09619405470cfc.
    Revert this change since this is not a robust change for Helix.

commit 87e5a912604c4f7170468bd0b966c3377230784f
Author: Weihan Kong <[email protected]>
Date:   2017-02-01T00:36:28Z

    In recovery, we prefer transition from second level state to top state, 
over transition from lower level state to top state(and overriding the instance 
preference in preference list)
    
    In recovery from e.g. Offline to Master, we'd prefer promoting it to Slave, 
and at the same time promote another Slave to Master. After this is finished, 
switch the state of the two nodes.
    This way, the system has higher availability (for Master) because it 
usually takes longer time to transit Offline to Slave than Slave to Master.
    Therefore, we don't want to transit lower level state directly to top 
state, we always try to only transit second level state to top state.
    
    For implementation, SemiAutoRebalancer and AutoRebalancer share the same 
logic, so remove the method in SemiAutoRebalancer, add it to AbstractRebalancer 
so that itâs used by both rebalancers.
    
    After best possible state is computed, the fix compares the current state 
and best possible state and determines whether any pairs of  instances should 
switch their best possible state. See more comments in the code.
    
    Add unit test for computeBestPossibleStateForPartition to test this feature 
specifically. 4 cases in total.
    Modified TestRebalancePipeline so that it matches the new behavior of the 
rebalancer while keeping the point of the original test. Also added a new test 
in it specifically testing that it should not be duplicated Masters.
    
    A couple code readability changes:
    1. Simply pass in liveInstance Set, instead of ClusterDataCache, since 
thatâs the only thing in the cache that is used, change corresponding method 
calls in DelayedAutoRebalancer and TestAutoRebalancerStrategy.
    2. Remove useless setups, comments in BaseStageTest, re-group 
setupStateModel() statements.

commit 3a4f9e8f6bcfb56b32725fd485bc9e97079d94df
Author: Lei Xia <[email protected]>
Date:   2017-01-30T23:02:24Z

    Clean up jobs in a jobqueue automatically after the job completes and 
passes its expiry time.

commit f32f6d1147d2d385e609e5e1754b33e768babdce
Author: Junkai Xue <[email protected]>
Date:   2017-02-02T19:13:44Z

    State Transition Cancellation Client side change Part II
    
    In Helix, there are many different scenrios which could make some pending 
state transitions not valid any more, for example, a resource is deleted while 
it still has some pending transitions, or Helix calculates a new ideal mapping 
while there are still some pending transitions not matching new mapping.  In 
such cases, Helix controller should proactively cancell these pending 
transitions instead of waiting them to finish.
    
    For details, please refer : 
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
    
    In this rb:
    1. Support MessageHandlerFactory can be registered by different message 
type.
    2. Refactor related API
    3. Add locking mechanism to avoid race condition between task started and 
cancel started.
    4. Add unit test for multi message type registeration.

commit bab2f5a5d28fffb31e738b87de5bd8f99dad88a9
Author: Junkai Xue <[email protected]>
Date:   2017-02-06T21:40:15Z

    Import Helix Open Source change
    
    [HELIX-651] Add a method in HelixAdmin to set the InstanceConfig of an 
existing instance
    
    - Add a setInstanceConfig() method in HelixAdmin interface
    - Add an implementation for the same in ZkHelixAdmin
    - Add a test in TestZkHelixAdmin

commit 003f639054b03e05817b98d7f379b22a3fcefb51
Author: Junkai Xue <[email protected]>
Date:   2017-02-06T22:27:24Z

    Import open source master change
    
    Added new DataSource values LIVEINSTANCES and INSTANCES and made 
CriteriaEvaluator support them

commit 237e47247e98c1a8a2bf63179738cc939c1bdd6d
Author: Weihan Kong <[email protected]>
Date:   2017-02-07T20:03:17Z

    Modify code style for 
TestAbstractRebalancerComputeBestPossibleStateForPartition
    
    User underscore for class variables.
    Use enum name as state value.
    One declaration per line.

commit 64e24f5d7db923be878fd4577df7df41e5118a29
Author: Lei Xia <[email protected]>
Date:   2017-02-07T01:29:02Z

    update pom version to align to open source release version.

commit 975fdc0e1dd3d1b8caeefe57eb1026e646533c2e
Author: Lei Xia <[email protected]>
Date:   2017-02-07T22:59:10Z

    Support configurable job purge interval for a queue.

commit 42a95d4e928ae85c26a75f3c6165fe92bc590e32
Author: Junkai Xue <[email protected]>
Date:   2017-02-09T01:12:18Z

    Fix TestBatchMessage test fail
    
    Test fail because new NO_OP message send as new MessageHandlerFactory 
registered.

commit 6a46da7962d0c831246390b6ec3c510136481214
Author: Weihan Kong <[email protected]>
Date:   2017-02-09T02:14:57Z

    Revert "In recovery, we prefer transition from second level state to top 
state, over transition from lower level state to top state(and overriding the 
instance preference in preference list)"
    
    This reverts commit 85356ccb3a2f45217d8f6d0753ccd2e37b06ffce.
    
    Revert "Modify code style for 
TestAbstractRebalancerComputeBestPossibleStateForPartition"
    
    This reverts commit 53546a67cde56624d4cce5a62be828dd4c0c13ff.

commit 5dad46989809fba871ec1b12528c596d1d92dddf
Author: Junkai Xue <[email protected]>
Date:   2017-02-09T02:43:18Z

    Import EvaluateCriteria change from open source
    
    This is the change need from Gobblin team.

commit 916d6731310efe314205a593e9c84715fefa5d06
Author: Lei Xia <[email protected]>
Date:   2017-02-09T19:46:28Z

    Update ivy files with new version.

commit d8054e2252389208fc74c654ec5a209cd9498cea
Author: Weihan Kong <[email protected]>
Date:   2017-02-09T07:38:49Z

    Prevent ClusterControllerManager from starting multiple times
    
    ClusterControllerManager is a runnable wrapper for a Helix Controller that 
could run on a separate thread for testing purpose. Since 
HelixManager.connect() should not be called more than once, this Controller 
should not be started more than once, either.

commit 87d4668db8c80ac2991efe601789be6bd38c5e9c
Author: Junkai Xue <[email protected]>
Date:   2017-02-07T22:13:43Z

    State Transition Cancellation Server change
    
    In Helix, there are many different scenrios which could make some pending 
state transitions not valid any more, for example, a resource is deleted while 
it still has some pending transitions, or Helix calculates a new ideal mapping 
while there are still some pending transitions not matching new mapping.  In 
such cases, Helix controller should proactively cancell these pending 
transitions instead of waiting them to finish.
    
    For details, please refer : 
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
    
    In this rb :
    1. Sending state transition cancellation message when pending state is not 
match the target state.
    2. Complete integration test the state transition cancellation.

commit 3d77f47d9775340aa1618c5b476e08412a7bc44c
Author: Weihan Kong <[email protected]>
Date:   2017-02-13T21:52:16Z

    Be able to stop workflow when no job is running.
    
    Currently, to stop a workflow, the target state of the workflow is set to 
STOP, then when each job(as a resource in ideal state) was processed in job 
rebalancer, it will check whether all the jobs in the workflow is done(not in 
IN_PROGRESS or STOPPING) and set the workflow state to be STOP.
    However, if all the jobs are already done, thereâs no job in ideal state 
to process, so the workflow state never gets a chance to be set to STOP.
    
    This commit adds a check in workflow rebalancer to set the state when all 
jobs are already done.
    
    A test is added to test specifically this case.

commit d7d68a1ff3f9da138c546b92409bad4026a98804
Author: Lei Xia <[email protected]>
Date:   2017-02-11T01:28:40Z

    Persist preference list into IdealState in full-auto mode and allow user to 
choose persisting either bestpossible or intermediate state mapping into the 
mapfield of IS.

commit 5884c93b8a07b47ee3289e5f8188a6d7fa2f5a22
Author: Lei Xia <[email protected]>
Date:   2017-02-15T16:17:58Z

    Add PropertyPathConfig back to code-base for API dependency backcompatible, 
will remove the class in next major release.

commit 6b228f89b32dac3a13827f205f27b3cbe97c8ab6
Author: Weihan Kong <[email protected]>
Date:   2017-02-14T00:50:05Z

    Add timeout in JobConfig
    
    To support job-level timeout for the task framework, add the configuration 
field. Associated changed is made in builder and JobBean.

commit 93d5ceb5b25ea3077c3ab456f36030229e7ffa07
Author: Lei Xia <[email protected]>
Date:   2017-02-10T23:19:52Z

    Refactor of partition movement throttling logic to make it clear.

commit 02e2d507c61443896ad7e78f5874e6d7309546a6
Author: Lei Xia <[email protected]>
Date:   2017-02-16T18:17:00Z

    Add one unit test to test delay rebalancer with cluster level delayed time 
set.

commit aca67e956da69fe7e9a47289c5dcc5b232b60fa8
Author: Junkai Xue <[email protected]>
Date:   2017-02-17T02:53:35Z

    Refactor the cancellation exception handling logic
    
    Refactor the cancellation exception handling logic

commit e6bb213e6cc0e92a775c946b8dd4b4b5c71b1a6f
Author: Junkai Xue <[email protected]>
Date:   2017-02-17T00:27:59Z

    Helix CurrentState change for monitoring
    
    Try to monitor the application availability, Helix shall provide metric to 
track missing top states for resource level.
    Design wiki : 
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+Monitoring+Metrics#HelixMonitoringMetrics-MetricsforMissingTopState
    
    For the latest update of design, it is better to have three new fields to 
report the state transitions: START_TIME, END_TIME and PREVIOUS STATE.

commit 39f307abd7e9c16e3377dcc808ff5c97eb0e7236
Author: Lei Xia <[email protected]>
Date:   2017-02-18T00:07:22Z

    Reorgnize the integration tests to some subpackages.

commit dbc0bd2443b2e15717133e9f64ab93fe4e981e68
Author: Lei Xia <[email protected]>
Date:   2017-02-19T20:14:03Z

    Moving all Mock classes from a giant class to individual classes in unit 
tests.

commit 71f72da46781e47207ca2273caeb8519d5ad8fee
Author: Lei Xia <[email protected]>
Date:   2017-02-21T21:53:21Z

    Avoid Zero replica during recovery and load rebalance.

commit 7a9375c1e30baba9e01bf7dd484cc8560ab8d670
Author: Junkai Xue <[email protected]>
Date:   2017-02-22T01:22:50Z

    Implementation for missing top state metrics
    
    Implement MBean update and satisfy following conditions:
    1. These missing top state metrics will be aggregated at resource.
    2. It reports the duration of missing top state, success / failure times of 
top state switch.
    3. If the top state switch is failed, the duration will be threshold set by 
user.

----


---

[GitHub] helix pull request #119: Giant PR for Helix 0.8

Reply via email to