[
https://issues.apache.org/jira/browse/KAFKA-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
hudeqi resolved KAFKA-15139.
----------------------------
Resolution: Fixed
> Optimize the performance of `Set.removeAll(List)` in
> `MirrorCheckpointConnector`
> --------------------------------------------------------------------------------
>
> Key: KAFKA-15139
> URL: https://issues.apache.org/jira/browse/KAFKA-15139
> Project: Kafka
> Issue Type: Improvement
> Components: mirrormaker
> Affects Versions: 3.5.0
> Reporter: hudeqi
> Assignee: hudeqi
> Priority: Major
>
> This is the hint of `removeAll` method in `Set`:
> _This implementation determines which is the smaller of this set and the
> specified collection, by invoking the size method on each. If this set has
> fewer elements, then the implementation iterates over this set, checking each
> element returned by the iterator in turn to see if it is contained in the
> specified collection. If it is so contained, it is removed from this set with
> the iterator's remove method. If the specified collection has fewer elements,
> then the implementation iterates over the specified collection, removing from
> this set each element returned by the iterator, using this set's remove
> method._
> That's said, assume that _M_ is the number of elements in the set and _N_ is
> the number of elements in the List, if the type of the specified collection
> is `List`, and {_}M<=N{_}, then the time complexity of `removeAll` is _O(MN)_
> (because the time complexity of searching in List is {_}O(N){_}), on the
> contrary, if {_}N<M{_}, it will search in `Set`, the time complexity is
> {_}O(N){_}.
> In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly
> called in a daemon thread. There are two `removeAll` in this method. From a
> logical point of view, when this method is called in one round, when the
> number of groups in the source cluster simply increases or decreases, the two
> `removeAll` execution strategies will always hit the _O(MN)_ situation
> mentioned above. Therefore, it is better to change all the variables here to
> Set type to avoid this "low performance".
--
This message was sent by Atlassian Jira
(v8.20.10#820010)