GitHub user dasahcc opened a pull request:
https://github.com/apache/helix/pull/210
Change migration strategy to N -> N+1 -> N model
Currently Helix takes N->2N->N strategy when migrating a partition,
where N equals to DB's replica count. When Helix decides to move a partition to
N new instances, it brings up all replicas in new instances first before drop
all replicas in old instances (so there will be 2N replica existing at certain
period of time). This approach gurantees the availability during migration but
may require bigger disk footprint. It may also cause a partition having more
than 6 replicas if the cluster topology keeps changing during migration.
What we proposed here is N -> N+1 -> N strategy, where Helix will
bootstrap a new replica in one of new instance, then drop one from old
instances. It then repeats the process until all replicas are moved to new
instances. This will reduce disk usage, but meanwhile still maintain at least N
active replica during the process. The new strategy can also avoid partition
having excessive replicas even there is toplogy changes during the migration.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/dasahcc/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/210.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #210
----
commit 4b08de5f7ab4e39a1599daa651bf49a96518f6a7
Author: Junkai Xue <jxue@...>
Date: 2018-06-28T21:27:32Z
Change migration strategy to N -> N+1 -> N model
Currently Helix takes N->2N->N strategy when migrating a partition,
where N equals to DB's replica count. When Helix decides to move a partition to
N new instances, it brings up all replicas in new instances first before drop
all replicas in old instances (so there will be 2N replica existing at certain
period of time). This approach gurantees the availability during migration but
may require bigger disk footprint. It may also cause a partition having more
than 6 replicas if the cluster topology keeps changing during migration.
What we proposed here is N -> N+1 -> N strategy, where Helix will
bootstrap a new replica in one of new instance, then drop one from old
instances. It then repeats the process until all replicas are moved to new
instances. This will reduce disk usage, but meanwhile still maintain at least N
active replica during the process. The new strategy can also avoid partition
having excessive replicas even there is toplogy changes during the migration.
----
---