GitHub user dasahcc opened a pull request:
https://github.com/apache/helix/pull/130
Another PR for 0.8.0 release
This PR contains couple improvements will be detail described in release
note.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dasahcc/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/130.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #130
----
commit 205b46419409f00fa7842bdac5c6d70aefecb032
Author: Junkai Xue <jxue@...>
Date: 2017-10-04T21:35:51Z
Trigger rebalance pipeline to ensure there is an rebalance happen
RB=1119190
G=helix-reviewers
A=lxia
commit f5f2cdb9db95280f51aa1ff290291463ef75df62
Author: Jiajun Wang <jjwang@...>
Date: 2017-09-29T00:37:38Z
Alerting application when rebalancing cannot be done.
Generating error message in the controller node when best possible state
cannot be calculated in unexpected situations.
Also emitting a Gauge metric in ClusterStatusMonitor to indicate
rebalancing error.
Additional change: fix 2 tests, which caused by previous changes.
TestClusterStatusMonitorLifecycle, TestSkipBestPossibleCalculation.
RB=1115467
BUG=HELIX-567
G=helix-reviewers
A=lxia
commit 93d11c34a5494f2d60e2f28e9f53be73a7a178a5
Author: Vivo Xu <vxu@...>
Date: 2017-10-09T21:59:34Z
[helix-front] forward responses from helix-rest instead of pipe
RB=1122755
BUG=PWN-12932
G=HelixUI-Reviewers
A=jxue
commit 70af346ef64da283e9b5ba95aa850250f5367b59
Author: Jiajun Wang <jjwang@...>
Date: 2017-10-11T22:34:40Z
Add io.dropwizard.metrics to ivy file
ivy file config is necessary for other build methods.
Also change metrics lib version from 3.2.3 to 3.1.2, which is the highest
available version in our internal environment.
RB=1125563
G=helix-reviewers
A=lxia
commit 69c6cb9d060723bbae547b77c1cc7945b6444fe0
Author: Jiajun Wang <jjwang@...>
Date: 2017-10-12T00:00:32Z
Add cluster config to tolerate ERROR Partition when trying to schedule load
balance transition
With this change, the controller will only perform load balance transition
if
1. no recovery operation to be scheduled.
2. error partition count is less than configured limitation.
The limitation is by default 0 (not tolerate ERROR partition). If the
setting is negative number, load balance will be triggered with any error count.
Note that controller will only throttle transition when
StateTransitionThrottleConfig is configured.
And the transition to remove "additional" ERROR replicas are regarded as
LOAD_BALANCE transition, when all required states are already available.
RB=1123661
BUG=HELIX-545
G=helix-reviewers
A=lxia
commit 6dc5969203cbd54e1ef13752e1688ceaad913929
Author: Lei Xia <lxia@...>
Date: 2017-10-17T17:08:16Z
Small refactor on HelixTaskExecutor.onMessage() logic.
RB=1130008
BUG=HELIX-537
G=helix-reviewers
A=jjwang,jxue
commit ec90d068199e97e028b2359e77acceb45bf7aca5
Author: Vivo Xu <vxu@...>
Date: 2017-10-17T21:54:27Z
[helix-front] Fix a partition list rendering issue
RB=1130495
BUG=DAAS-5252
G=HelixUI-Reviewers
A=jxue
commit a1af3181d93c15b772688ea54a49f21407e596b7
Author: Jiajun Wang <jjwang@...>
Date: 2017-10-18T21:25:10Z
[maven-release-plugin] prepare for next development iteration
RB=1131766
G=helix-reviewers
A=lxia
commit eb9fff5483fe121f6f52d35b0f32501e6d253a38
Author: Junkai Xue <jxue@...>
Date: 2017-10-12T01:18:20Z
Batch API Implementation
This rb include
1. The batch API define and batch API implementation and make old API
backward compatible.
2. The server changes the logic to determine which are the disabled
instances for both batch API or old API.
RB=1135522
BUG=HELIX-586,HELIX-601
G=helix-reviewers
A=lxia,hrzhang
commit a5e06c388f7271a2d99a679a1f0a38c7afb62484
Author: Harry Zhang <hrzhang@...>
Date: 2017-10-18T00:25:24Z
Use slf4j for helix
RB=1132707
BUG=HELIX-415
G=helix-reviewers
A=jjwang
commit d544d7e6c0abb624473fc62023c6dfb669936386
Author: Lei Xia <lxia@...>
Date: 2017-10-19T22:16:48Z
Remove duplicated log to report time spending on each stage, and small
refactor in CurrentStateComputationStage.
RB=1133160
BUG=HELIX-537
G=helix-reviewers
A=jxue
commit 727f3cdf07f41868c99c1145bc8d76a964a40814
Author: Junkai Xue <jxue@...>
Date: 2017-10-23T20:38:40Z
Add cluster config listener in generic helix controller
RB=1134961
G=helix-reviewers
A=lxia,hrzhang
commit b5f909c5a3f2208cbb942ee571b86806efc02bb4
Author: Junkai Xue <jxue@...>
Date: 2017-10-30T23:01:57Z
Fix MultiRound CRUSH that cannot select any node from second round
RB=1141984
G=helix-reviewers
A=lxia
commit 6ef6a4cd1b3560b590dec45fc1917bcf3badb107
Author: hrzhang <hrzhang@...>
Date: 2017-11-01T00:12:04Z
Add test for GenericHelixController thread leak
RB=1143384
BUG=HELIX-211
G=helix-reviewers
A=jjwang
commit 343ea64be6ba7f52ec6476e70ba16eea2cd0f483
Author: Lei Xia <lxia@...>
Date: 2017-11-02T20:26:38Z
Fix a minor issue when updating a workflowConfig with empty workflowId.
RB=1145567
G=helix-reviewers
A=jjwang,hrzhang
commit 1866ae88bfdb8911b7873f83d4f57cf66c4f3e2d
Author: Jiajun Wang <jjwang@...>
Date: 2017-11-03T22:54:04Z
Add throttling to prevent too many workflows/jobs created.
ZK has issue that a large amount of nodes in one path will prevent
getChildNames to be return successfully.
This change is a workaround to minimize the problem before ZK service side
is ready.
RB=1147382
BUG=HELIX-619
G=helix-reviewers
A=lxia
commit d9669402973ba00b17bf7394a76d17efea04421e
Author: Lei Xia <lxia@...>
Date: 2017-10-09T21:52:19Z
Add P2P (Participant-to-Participant) state-transition message support in
Helix controller.
RB=1135094
BUG=HELIX-537,HELIX-538
G=helix-reviewers
A=hrzhang,jxue
commit d6d1ae05571dbd9b9440b6431a19b48ed8a84874
Author: Junkai Xue <jxue@...>
Date: 2017-11-02T22:00:24Z
Cluster Maintenance Mode feature support
Helix does not have a state that keep original partitions are active status
without doing rebalance partition placement.
The only state controller can switch to is paused state. If the controller
has been paused, all the replicas in this cluster will not be active anymore.
It is better to have another mode that let current replicas functioning
well without new replicas bootstraps when the cluster is full or instance.
There are several scenarios that may need such mode that keep original
assignment of partitions without partition movement. At same time, no
partition will be assigned for newly added resources. This mode is
call cluster maintenance mode.
For more detail, please refer:
https://iwww.corp.linkedin.com/wiki/cf/display/ENGS/Cluster+Maintenance+Mode+Design
RB=1148610
BUG=HELIX-623
G=helix-reviewers
A=lxia,jjwang
commit 0dcc908ba84a6a0906f2eb56d1fd320d1f940a33
Author: hrzhang <hrzhang@...>
Date: 2017-11-07T19:38:36Z
use SlidingTimeWindownReservoir for histogram stats
RB=1149824
G=helix-reviewers
R=lxia,jjwang,jxue,erkim
A=jjwang
commit 46438e5e746bf7cfff0be0075e28317bb7fa7a0e
Author: Jiajun Wang <jjwang@...>
Date: 2017-11-08T00:12:42Z
Temporary disable logging rebalance error before HELIX-631 is resolved.
An issue is found that legacy code assumes all controllers' instance names
start with "controller". However, this assumption is no longer valid now.
This results in log to be written to a wrong path.
Before we resolve the problem cleanly, disable the error log.
RB=1150471
BUG=HELIX-631
G=helix-reviewers
A=lxia
commit c208844fc181be73883a83dfca0104664f68d080
Author: Junkai Xue <jxue@...>
Date: 2017-11-08T19:24:54Z
Switch pause cluster to maintenance mode when hit maximal offline instance
limit
RB=1148644
BUG=HELIX-623
G=helix-reviewers
A=lxia,jjwang
commit 373cfcba2d8a09fe896bda92b2e110955f704f54
Author: Jiajun Wang <jjwang@...>
Date: 2017-11-09T19:22:18Z
Cleanup ZkHelixManager and ZkClient config items to remove the ambiguous
ones.
We notice that several items are using default values from different
classes. And some default items are not set properly. Fix the problem to
prevent any future config issues.
Also add comments to clear the usage of each settings.
RB=1153004
G=helix-reviewers
A=lxia,hrzhang
commit dfdbdc3b159f7f220688b3be749be078a3e5c6b3
Author: Junkai Xue <jxue@...>
Date: 2017-11-09T23:05:04Z
Support new API for getChildren with retry logic.
Current getChildren will remove the znode from list if it has not been
read. It could return partial result of application expected. It will be a
problem for applcations needs completed data.
New API will support retry logic. If it failed to read all the data from
the ZK in retry count. Helix will throw an exception.
TODO: Helix will change the old API's behaivor when Helix start migrating
APIs.
RB=1153442
BUG=HELIX-633
G=helix-reviewers
A=lxia,hrzhang
commit 3f7efa0f091c215bdd938ff4824806031089e1b4
Author: hrzhang <hrzhang@...>
Date: 2017-11-13T21:39:32Z
HELIX-614: add gauge for failed workflow
RB=1155532
BUG=HELIX-614
R=lxia,jjwang,jxue,erkim
A=jjwang,jxue
commit 3687de780179996cf2875a6e369ced4301c56a48
Author: hrzhang <hrzhang@...>
Date: 2017-11-14T22:49:47Z
Use SlidingTimeWindowArrayReservoir to reduce memory consumption
RB=1157213
R=lxia,jjwang,jxue,erkim
A=jjwang
commit ed80be29668d51c1b44b3787a0bf757270698cdc
Author: hrzhang <hrzhang@...>
Date: 2017-11-14T22:56:54Z
Update ivy file in helix-core to use metrics-core 3.2.3
RB=1157232
R=lxia,jjwang,jxue,erkim
A=jjwang
commit 1bea191bc946c89f25e275ab4526bf1d0c741eed
Author: Junkai Xue <jxue@...>
Date: 2017-11-15T21:09:39Z
REST API support cluster maintenance mode enable disable
RB=1158600
BUG=HELIX-646
G=helix-reviewers
A=lxia
commit afd4c60a7f15e12c7aaf1967590fb49f74f0de52
Author: Junkai Xue <jxue@...>
Date: 2017-11-16T01:31:00Z
REST support for batch API enable and disable
RB=1159201
BUG=HELIX-646
G=helix-reviewers
A=lxia
commit 65ffe8405998d5e931c4f5dcff4ac3fff71b4596
Author: Junkai Xue <jxue@...>
Date: 2017-11-16T01:54:28Z
Support update complete instance config in REST
RB=1159244
BUG=HELIX-647
G=helix-reviewers
A=lxia
commit 26c0de804267cb375b318444a45f312aa543f753
Author: Junkai Xue <jxue@...>
Date: 2017-11-16T20:18:39Z
Temporary disable batch enable/disable API feature
RB=1159869
G=helix-reviewers
A=lxia
----
---