[ https://issues.apache.org/jira/browse/KAFKA-19507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jialun Peng reassigned KAFKA-19507: ----------------------------------- Assignee: Jialun Peng > Optimize Replica Assignment for Broker Load Balance in Uneven Rack > Configurations > --------------------------------------------------------------------------------- > > Key: KAFKA-19507 > URL: https://issues.apache.org/jira/browse/KAFKA-19507 > Project: Kafka > Issue Type: Improvement > Reporter: Jialun Peng > Assignee: Jialun Peng > Priority: Major > > h3. Issue Description > Kafka's current replica assignment strategy prioritizes _balancing replica > counts across racks_ (availability zones in cloud environments) over > _balancing replicas across individual brokers_. While this ensures rack > diversity, it creates significant broker-level load imbalance when racks > contain unequal numbers of brokers. > h3. Problem Illustration > Consider a 3-replica topic with 3 racks: > * *Rack A*: Brokers 1, 4 > * *Rack B*: Brokers 2, 5 > * *Rack C*: Broker 3 (single broker) > Under the current strategy: > * Brokers 1, 2, 4, 5 each receive 1/6 of all replicas > * Broker 3 receives 1/3 of all replicas (twice the load of others) > This forces Broker 3 into a bottleneck ("bucket effect"), as it handles > double the traffic and storage load. > > To mitigate this, deployments today must maintain broker counts as _multiples > of rack counts_ (e.g., 3, 6, 9 brokers for 3 racks). While this ensures > balance, it: > # *Restricts deployment flexibility*: Scaling clusters horizontally requires > adding/removing nodes in rack-sized increments. > # *Increases costs unnecessarily*: For example, a 4-broker cluster could > suffice for a 3-rack setup, but users must deploy 6 brokers to maintain > balance—increasing infrastructure costs by 50%. > h3. Proposed Solution > Modify the assignment strategy to: > # *Prioritize broker-level balance* as the primary objective. > # *Weight rack-level distribution* by broker count per rack (e.g., a rack > with 2 brokers receives twice the replicas of a rack with 1 broker). > h4. Benefits > * *Balanced load*: All brokers receive near-equal replicas regardless of > rack imbalance. > * *Deployment flexibility*: Clusters can scale to _any size_ as long as > {{rack_count ≥ replica_factor}}. > * *Cost efficiency*: Users deploy only necessary brokers. > h4. Example Scenario > _3 replicas, 4 racks with 5 brokers:_ > * *Rack A*: Brokers 1, 5 → Receives 2/5 of replicas (distributed evenly > between Brokers 1 & 5) > * *Racks B, C, D*: 1 broker each → Each receives 1/5 of replicas _Result_: > Every broker handles exactly 1/5 of total replicas—eliminating bottlenecks. > h3. Request > We propose modifying the replica assignment algorithm to prioritize > broker-level replica balance, while using rack-node-count-weighted > distribution. This allows enterprises to deploy Kafka clusters with more > flexible node counts, significantly improving cost efficiency while > maintaining rack awareness. -- This message was sent by Atlassian Jira (v8.20.10#820010)