[
https://issues.apache.org/jira/browse/KAFKA-6753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lucas Wang reopened KAFKA-6753:
-------------------------------
> Speed up event processing on the controller
> --------------------------------------------
>
> Key: KAFKA-6753
> URL: https://issues.apache.org/jira/browse/KAFKA-6753
> Project: Kafka
> Issue Type: Improvement
> Reporter: Lucas Wang
> Assignee: Lucas Wang
> Priority: Minor
> Fix For: 2.1.0
>
> Attachments: Screen Shot 2018-04-04 at 7.08.55 PM.png
>
>
> The existing controller code updates metrics after processing every event.
> This can slow down event processing on the controller tremendously. In one
> profiling we see that updating metrics takes nearly 100% of the CPU for the
> controller event processing thread. Specifically the slowness can be
> attributed to two factors:
> 1. Each invocation to update the metrics is expensive. Specifically trying to
> calculate the offline partitions count requires iterating through all the
> partitions in the cluster to check if the partition is offline; and
> calculating the preferred replica imbalance count requires iterating through
> all the partitions in the cluster to check if a partition has a leader other
> than the preferred leader. In a large cluster, the number of partitions can
> be quite large, all seen by the controller. Even if the time spent to check a
> single partition is small, the accumulation effect of so many partitions in
> the cluster can make the invocation to update metrics quite expensive. One
> might argue that maybe the logic for processing each single partition is not
> optimized, we checked the CPU percentage of leaf nodes in the profiling
> result, and found that inside the loops of collection objects, e.g. the set
> of all partitions, no single function dominates the processing. Hence the
> large number of the partitions in a cluster is the main contributor to the
> slowness of one invocation to update the metrics.
> 2. The invocation to update metrics is called many times when the is a high
> number of events to be processed by the controller, one invocation after
> processing any event.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)