[
https://issues.apache.org/jira/browse/KAFKA-19048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jialun Peng updated KAFKA-19048:
--------------------------------
External issue URL: (was:
https://issues.apache.org/jira/browse/KAFKA-1792)
> Minimal Movement Replica Balancing algorithm
> --------------------------------------------
>
> Key: KAFKA-19048
> URL: https://issues.apache.org/jira/browse/KAFKA-19048
> Project: Kafka
> Issue Type: Improvement
> Components: generator
> Reporter: Jialun Peng
> Assignee: Jialun Peng
> Priority: Major
>
> h2. Motivation
> Kafka clusters typically require rebalancing of topic replicas after
> horizontal scaling to evenly distribute the load across new and existing
> brokers. The current rebalancing approach does not consider the existing
> replica distribution, often resulting in excessive and unnecessary replica
> movements. These unnecessary movements increase rebalance duration, consume
> significant bandwidth and CPU resources, and potentially disrupt ongoing
> production and consumption operations. Thus, a replica rebalancing strategy
> that minimizes movements while achieving an even distribution of replicas is
> necessary.
> h2. Goals
> The proposed approach prioritizes the following objectives:
> # {*}Minimal Movement{*}: Minimize the number of replica relocations during
> rebalancing.
> # {*}Replica Balancing{*}: Ensure that replicas are evenly distributed
> across brokers.
> # {*}Anti-Affinity Support{*}: Support rack-aware allocation when enabled.
> # {*}Leader Balancing{*}: Distribute leader replicas evenly across brokers.
> # {*}ISR Order Optimization{*}: Optimize adjacency relationships to prevent
> failover traffic concentration in case of broker failures.
> h2. Proposed Changes
> h3. Rack-Level Replica Distribution
> The following rules ensure balanced replica allocation at the rack level:
> # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}:
> *
> ** Each rack receives exactly {{partitionCount}} replicas.
> # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}:
> *
> ** If weighted allocation {{{}(rackBrokers/totalBrokers × totalReplicas) ≥
> partitionCount{}}}: each rack receives exactly {{partitionCount}} replicas.
> *
> ** If weighted allocation {{{}< partitionCount{}}}: distribute remaining
> replicas using a weighted remainder allocation.
> h3. Node-Level Replica Distribution
> # If the number of replicas assigned to a rack is not a multiple of the
> number of nodes in that rack, some nodes will host one additional replica
> compared to others.
> # {*}When ********{{*}}{{{}*rackCount = replicationFactor*{}}}:
> *
> ** If all racks have an equal number of nodes, each node will host an equal
> number of replicas.
> *
> ** If rack sizes vary, nodes in larger racks will host fewer replicas on
> average.
> # {*}When ********{{*}}{{{}*rackCount > replicationFactor*{}}}:
> *
> ** If no rack has a significantly higher node weight, replicas will be
> evenly distributed.
> *
> ** If a rack has disproportionately high node weight, those nodes will
> receive fewer replicas.
> h3. Anti-Affinity Support
> When anti-affinity is enabled, the rebalance algorithm ensures that replicas
> of the same partition do not colocate on the same rack. Brokers without rack
> configuration are excluded from anti-affinity checks.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)