[ https://issues.apache.org/jira/browse/KAFKA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Rao resolved KAFKA-7537. ---------------------------- Resolution: Fixed Fix Version/s: 2.2.0 Merged the PR to trunk. > Only include live brokers in the UpdateMetadataRequest sent to existing > brokers if there is no change in the partition states > ----------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-7537 > URL: https://issues.apache.org/jira/browse/KAFKA-7537 > Project: Kafka > Issue Type: Improvement > Components: controller > Reporter: Zhanxiang (Patrick) Huang > Assignee: Zhanxiang (Patrick) Huang > Priority: Major > Fix For: 2.2.0 > > > Currently if when brokers join/leave the cluster without any partition states > changes, controller will send out UpdateMetadataRequests containing the > states of all partitions to all brokers. But for existing brokers in the > cluster, the metadata diff between controller and the broker should only be > the "live_brokers" info. Only the brokers with empty metadata cache need the > full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all > brokers can place nonnegligible memory pressure on the controller side. > Let's say in total we have N brokers, M partitions in the cluster and we want > to add 1 brand new broker in the cluster. With RF=2, the memory footprint per > partition in the UpdateMetadataRequest is ~200 Bytes. In the current > controller implementation, if each of the N RequestSendThreads serializes and > sends out the UpdateMetadataRequest at roughly the same time (which is very > likely the case), we will end up using *(N+1)*M*200B*. In a large kafka > cluster, we can have: > {noformat} > N=99 > M=100k > Memory usage to send out UpdateMetadataRequest to all brokers: > 100 * 100K * 200B = 2G > However, we only need to send out full UpdateMetadataRequest to the newly > added broker. We only need to include live broker ids (4B * 100 brokers) in > the UpdateMetadataRequest sent to the existing 99 brokers. So the amount of > data that is actully needed will be: > 1 * 100K * 200B + 99 * (100 * 4B) = ~21M > We will can potentially reduce 2G / 21M = ~95x memory footprint as well as > the data tranferred in the network.{noformat} > > This issue kind of hurts the scalability of a kafka cluster. KIP-380 and > KAFKA-7186 also help to further reduce the controller memory footprint. > > In terms of implementation, we can keep some in-memory state in the > controller side to differentiate existing brokers and uninitialized brokers > (e.g. brand new brokers) so that if there is no change in partition states, > we only send out live brokers info to existing brokers. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)