[ https://issues.apache.org/jira/browse/KAFKA-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Harris updated KAFKA-15139: -------------------------------- Fix Version/s: 3.6.0 > Optimize the performance of `Set.removeAll(List)` in > `MirrorCheckpointConnector` > -------------------------------------------------------------------------------- > > Key: KAFKA-15139 > URL: https://issues.apache.org/jira/browse/KAFKA-15139 > Project: Kafka > Issue Type: Improvement > Components: mirrormaker > Affects Versions: 3.5.0 > Reporter: hudeqi > Assignee: hudeqi > Priority: Major > Fix For: 3.6.0 > > > This is the hint of `removeAll` method in `Set`: > _This implementation determines which is the smaller of this set and the > specified collection, by invoking the size method on each. If this set has > fewer elements, then the implementation iterates over this set, checking each > element returned by the iterator in turn to see if it is contained in the > specified collection. If it is so contained, it is removed from this set with > the iterator's remove method. If the specified collection has fewer elements, > then the implementation iterates over the specified collection, removing from > this set each element returned by the iterator, using this set's remove > method._ > That's said, assume that _M_ is the number of elements in the set and _N_ is > the number of elements in the List, if the type of the specified collection > is `List`, and {_}M<=N{_}, then the time complexity of `removeAll` is _O(MN)_ > (because the time complexity of searching in List is {_}O(N){_}), on the > contrary, if {_}N<M{_}, it will search in `Set`, the time complexity is > {_}O(N){_}. > In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly > called in a daemon thread. There are two `removeAll` in this method. From a > logical point of view, when this method is called in one round, when the > number of groups in the source cluster simply increases or decreases, the two > `removeAll` execution strategies will always hit the _O(MN)_ situation > mentioned above. Therefore, it is better to change all the variables here to > Set type to avoid this "low performance". -- This message was sent by Atlassian Jira (v8.20.10#820010)