[ 
https://issues.apache.org/jira/browse/KAFKA-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hudeqi resolved KAFKA-15139.
----------------------------
    Resolution: Fixed

> Optimize the performance of `Set.removeAll(List)` in 
> `MirrorCheckpointConnector`
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-15139
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15139
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>    Affects Versions: 3.5.0
>            Reporter: hudeqi
>            Assignee: hudeqi
>            Priority: Major
>
> This is the hint of `removeAll` method in `Set`:
> _This implementation determines which is the smaller of this set and the 
> specified collection, by invoking the size method on each. If this set has 
> fewer elements, then the implementation iterates over this set, checking each 
> element returned by the iterator in turn to see if it is contained in the 
> specified collection. If it is so contained, it is removed from this set with 
> the iterator's remove method. If the specified collection has fewer elements, 
> then the implementation iterates over the specified collection, removing from 
> this set each element returned by the iterator, using this set's remove 
> method._
> That's said, assume that _M_ is the number of elements in the set and _N_ is 
> the number of elements in the List, if the type of the specified collection 
> is `List`, and {_}M<=N{_}, then the time complexity of `removeAll` is _O(MN)_ 
> (because the time complexity of searching in List is {_}O(N){_}), on the 
> contrary, if {_}N<M{_}, it will search in `Set`, the time complexity is 
> {_}O(N){_}.
> In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly 
> called in a daemon thread. There are two `removeAll` in this method. From a 
> logical point of view, when this method is called in one round, when the 
> number of groups in the source cluster simply increases or decreases, the two 
> `removeAll` execution strategies will always hit the _O(MN)_ situation 
> mentioned above. Therefore, it is better to change all the variables here to 
> Set type to avoid this "low performance".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to