GitHub user dasahcc opened a pull request:
https://github.com/apache/helix/pull/119
Giant PR for Helix 0.8
Giant PR for Helix 0.8. It includes multiple features and new Helix UI /
Helix REST.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dasahcc/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/119.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #119
----
commit 310457c2462345127f6a1e5e133bd8d19c4e5482
Author: Junkai Xue <[email protected]>
Date: 2017-01-06T02:14:02Z
State Transition Cancellation Client Implementation
State transition takes a vital part of Helix managing clusters. There are
different reasons can cause state transition is not necessary, for example, the
node of partition running state transition is down. Thus state transition
cancellation would be a useful feature to have in Helix. It not only helps
cancel the state transition to avoid invalid state but also benefits for
reducing redundant state transitions.
For details, please refer :
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
In this implementation following are included:
1. Add new StateTransitionCancellationHandler
2. Implement cancel logic of message received not handled, message handled
task not started and message handled task started.
3. Add default implementation of cancel method in StateModel
4. Add new STATE_TRANSITION_CANCELLATION message type
5. Unit test for cancel logic
commit 4fee01cdeffbc3fcd07a6aa06c4fe4bf61c0c811
Author: Lei Xia <[email protected]>
Date: 2017-01-31T01:52:24Z
Add missing files for helix-rest module.
commit 29817cfc26571474088656f7491d9a8069785a75
Author: Junkai Xue <[email protected]>
Date: 2017-01-26T01:15:04Z
Support cancel tasks with synchronized check task status
Currently, in Helix, cancel and stop a job does not check subtasks status.
In this rb:
1. Add new API to support sync stopping a workflow/queue
2. Controller side check subtasks are stopped before mark job status.
commit 0bb7998037908a31e287c3a8a0a10eb08514ca61
Author: Junkai Xue <[email protected]>
Date: 2017-02-02T04:24:04Z
Revert "Enable maxRecipient in Criteria"
This reverts commit 0a0c7dcf7a95e7c1050fa69bcb09619405470cfc.
Revert this change since this is not a robust change for Helix.
commit 87e5a912604c4f7170468bd0b966c3377230784f
Author: Weihan Kong <[email protected]>
Date: 2017-02-01T00:36:28Z
In recovery, we prefer transition from second level state to top state,
over transition from lower level state to top state(and overriding the instance
preference in preference list)
In recovery from e.g. Offline to Master, we'd prefer promoting it to Slave,
and at the same time promote another Slave to Master. After this is finished,
switch the state of the two nodes.
This way, the system has higher availability (for Master) because it
usually takes longer time to transit Offline to Slave than Slave to Master.
Therefore, we don't want to transit lower level state directly to top
state, we always try to only transit second level state to top state.
For implementation, SemiAutoRebalancer and AutoRebalancer share the same
logic, so remove the method in SemiAutoRebalancer, add it to AbstractRebalancer
so that itâs used by both rebalancers.
After best possible state is computed, the fix compares the current state
and best possible state and determines whether any pairs of instances should
switch their best possible state. See more comments in the code.
Add unit test for computeBestPossibleStateForPartition to test this feature
specifically. 4 cases in total.
Modified TestRebalancePipeline so that it matches the new behavior of the
rebalancer while keeping the point of the original test. Also added a new test
in it specifically testing that it should not be duplicated Masters.
A couple code readability changes:
1. Simply pass in liveInstance Set, instead of ClusterDataCache, since
thatâs the only thing in the cache that is used, change corresponding method
calls in DelayedAutoRebalancer and TestAutoRebalancerStrategy.
2. Remove useless setups, comments in BaseStageTest, re-group
setupStateModel() statements.
commit 3a4f9e8f6bcfb56b32725fd485bc9e97079d94df
Author: Lei Xia <[email protected]>
Date: 2017-01-30T23:02:24Z
Clean up jobs in a jobqueue automatically after the job completes and
passes its expiry time.
commit f32f6d1147d2d385e609e5e1754b33e768babdce
Author: Junkai Xue <[email protected]>
Date: 2017-02-02T19:13:44Z
State Transition Cancellation Client side change Part II
In Helix, there are many different scenrios which could make some pending
state transitions not valid any more, for example, a resource is deleted while
it still has some pending transitions, or Helix calculates a new ideal mapping
while there are still some pending transitions not matching new mapping. In
such cases, Helix controller should proactively cancell these pending
transitions instead of waiting them to finish.
For details, please refer :
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
In this rb:
1. Support MessageHandlerFactory can be registered by different message
type.
2. Refactor related API
3. Add locking mechanism to avoid race condition between task started and
cancel started.
4. Add unit test for multi message type registeration.
commit bab2f5a5d28fffb31e738b87de5bd8f99dad88a9
Author: Junkai Xue <[email protected]>
Date: 2017-02-06T21:40:15Z
Import Helix Open Source change
[HELIX-651] Add a method in HelixAdmin to set the InstanceConfig of an
existing instance
- Add a setInstanceConfig() method in HelixAdmin interface
- Add an implementation for the same in ZkHelixAdmin
- Add a test in TestZkHelixAdmin
commit 003f639054b03e05817b98d7f379b22a3fcefb51
Author: Junkai Xue <[email protected]>
Date: 2017-02-06T22:27:24Z
Import open source master change
Added new DataSource values LIVEINSTANCES and INSTANCES and made
CriteriaEvaluator support them
commit 237e47247e98c1a8a2bf63179738cc939c1bdd6d
Author: Weihan Kong <[email protected]>
Date: 2017-02-07T20:03:17Z
Modify code style for
TestAbstractRebalancerComputeBestPossibleStateForPartition
User underscore for class variables.
Use enum name as state value.
One declaration per line.
commit 64e24f5d7db923be878fd4577df7df41e5118a29
Author: Lei Xia <[email protected]>
Date: 2017-02-07T01:29:02Z
update pom version to align to open source release version.
commit 975fdc0e1dd3d1b8caeefe57eb1026e646533c2e
Author: Lei Xia <[email protected]>
Date: 2017-02-07T22:59:10Z
Support configurable job purge interval for a queue.
commit 42a95d4e928ae85c26a75f3c6165fe92bc590e32
Author: Junkai Xue <[email protected]>
Date: 2017-02-09T01:12:18Z
Fix TestBatchMessage test fail
Test fail because new NO_OP message send as new MessageHandlerFactory
registered.
commit 6a46da7962d0c831246390b6ec3c510136481214
Author: Weihan Kong <[email protected]>
Date: 2017-02-09T02:14:57Z
Revert "In recovery, we prefer transition from second level state to top
state, over transition from lower level state to top state(and overriding the
instance preference in preference list)"
This reverts commit 85356ccb3a2f45217d8f6d0753ccd2e37b06ffce.
Revert "Modify code style for
TestAbstractRebalancerComputeBestPossibleStateForPartition"
This reverts commit 53546a67cde56624d4cce5a62be828dd4c0c13ff.
commit 5dad46989809fba871ec1b12528c596d1d92dddf
Author: Junkai Xue <[email protected]>
Date: 2017-02-09T02:43:18Z
Import EvaluateCriteria change from open source
This is the change need from Gobblin team.
commit 916d6731310efe314205a593e9c84715fefa5d06
Author: Lei Xia <[email protected]>
Date: 2017-02-09T19:46:28Z
Update ivy files with new version.
commit d8054e2252389208fc74c654ec5a209cd9498cea
Author: Weihan Kong <[email protected]>
Date: 2017-02-09T07:38:49Z
Prevent ClusterControllerManager from starting multiple times
ClusterControllerManager is a runnable wrapper for a Helix Controller that
could run on a separate thread for testing purpose. Since
HelixManager.connect() should not be called more than once, this Controller
should not be started more than once, either.
commit 87d4668db8c80ac2991efe601789be6bd38c5e9c
Author: Junkai Xue <[email protected]>
Date: 2017-02-07T22:13:43Z
State Transition Cancellation Server change
In Helix, there are many different scenrios which could make some pending
state transitions not valid any more, for example, a resource is deleted while
it still has some pending transitions, or Helix calculates a new ideal mapping
while there are still some pending transitions not matching new mapping. In
such cases, Helix controller should proactively cancell these pending
transitions instead of waiting them to finish.
For details, please refer :
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+State+Transition+Cancellation+Design
In this rb :
1. Sending state transition cancellation message when pending state is not
match the target state.
2. Complete integration test the state transition cancellation.
commit 3d77f47d9775340aa1618c5b476e08412a7bc44c
Author: Weihan Kong <[email protected]>
Date: 2017-02-13T21:52:16Z
Be able to stop workflow when no job is running.
Currently, to stop a workflow, the target state of the workflow is set to
STOP, then when each job(as a resource in ideal state) was processed in job
rebalancer, it will check whether all the jobs in the workflow is done(not in
IN_PROGRESS or STOPPING) and set the workflow state to be STOP.
However, if all the jobs are already done, thereâs no job in ideal state
to process, so the workflow state never gets a chance to be set to STOP.
This commit adds a check in workflow rebalancer to set the state when all
jobs are already done.
A test is added to test specifically this case.
commit d7d68a1ff3f9da138c546b92409bad4026a98804
Author: Lei Xia <[email protected]>
Date: 2017-02-11T01:28:40Z
Persist preference list into IdealState in full-auto mode and allow user to
choose persisting either bestpossible or intermediate state mapping into the
mapfield of IS.
commit 5884c93b8a07b47ee3289e5f8188a6d7fa2f5a22
Author: Lei Xia <[email protected]>
Date: 2017-02-15T16:17:58Z
Add PropertyPathConfig back to code-base for API dependency backcompatible,
will remove the class in next major release.
commit 6b228f89b32dac3a13827f205f27b3cbe97c8ab6
Author: Weihan Kong <[email protected]>
Date: 2017-02-14T00:50:05Z
Add timeout in JobConfig
To support job-level timeout for the task framework, add the configuration
field. Associated changed is made in builder and JobBean.
commit 93d5ceb5b25ea3077c3ab456f36030229e7ffa07
Author: Lei Xia <[email protected]>
Date: 2017-02-10T23:19:52Z
Refactor of partition movement throttling logic to make it clear.
commit 02e2d507c61443896ad7e78f5874e6d7309546a6
Author: Lei Xia <[email protected]>
Date: 2017-02-16T18:17:00Z
Add one unit test to test delay rebalancer with cluster level delayed time
set.
commit aca67e956da69fe7e9a47289c5dcc5b232b60fa8
Author: Junkai Xue <[email protected]>
Date: 2017-02-17T02:53:35Z
Refactor the cancellation exception handling logic
Refactor the cancellation exception handling logic
commit e6bb213e6cc0e92a775c946b8dd4b4b5c71b1a6f
Author: Junkai Xue <[email protected]>
Date: 2017-02-17T00:27:59Z
Helix CurrentState change for monitoring
Try to monitor the application availability, Helix shall provide metric to
track missing top states for resource level.
Design wiki :
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Helix+Monitoring+Metrics#HelixMonitoringMetrics-MetricsforMissingTopState
For the latest update of design, it is better to have three new fields to
report the state transitions: START_TIME, END_TIME and PREVIOUS STATE.
commit 39f307abd7e9c16e3377dcc808ff5c97eb0e7236
Author: Lei Xia <[email protected]>
Date: 2017-02-18T00:07:22Z
Reorgnize the integration tests to some subpackages.
commit dbc0bd2443b2e15717133e9f64ab93fe4e981e68
Author: Lei Xia <[email protected]>
Date: 2017-02-19T20:14:03Z
Moving all Mock classes from a giant class to individual classes in unit
tests.
commit 71f72da46781e47207ca2273caeb8519d5ad8fee
Author: Lei Xia <[email protected]>
Date: 2017-02-21T21:53:21Z
Avoid Zero replica during recovery and load rebalance.
commit 7a9375c1e30baba9e01bf7dd484cc8560ab8d670
Author: Junkai Xue <[email protected]>
Date: 2017-02-22T01:22:50Z
Implementation for missing top state metrics
Implement MBean update and satisfy following conditions:
1. These missing top state metrics will be aggregated at resource.
2. It reports the duration of missing top state, success / failure times of
top state switch.
3. If the top state switch is failed, the duration will be threshold set by
user.
----
---