[ 
https://issues.apache.org/jira/browse/KAFKA-15139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hudeqi updated KAFKA-15139:
---------------------------
    Description: 
This is the hint of `removeAll` method in `Set`:

_This implementation determines which is the smaller of this set and the 
specified collection, by invoking the size method on each. If this set has 
fewer elements, then the implementation iterates over this set, checking each 
element returned by the iterator in turn to see if it is contained in the 
specified collection. If it is so contained, it is removed from this set with 
the iterator's remove method. If the specified collection has fewer elements, 
then the implementation iterates over the specified collection, removing from 
this set each element returned by the iterator, using this set's remove method._

That's said, assume that _M_ is the number of elements in the set and _N_ is 
the number of elements in the List, if the type of the specified collection is 
`List`, and {_}M<=N{_}, then the time complexity of `removeAll` is _O(MN)_ 
(because the time complexity of searching in List is {_}O(N){_}), on the 
contrary, if {_}N<M{_}, it will search in `Set`, the time complexity is 
{_}O(N){_}.

In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly 
called in a daemon thread. There are two `removeAll` in this method. From a 
logical point of view, when this method is called in one round, when the number 
of groups in the source cluster simply increases or decreases, the two 
`removeAll` execution strategies will always hit the _O(MN)_ situation 
mentioned above. Therefore, it is better to change all the variables here to 
Set type to avoid this "low performance".

  was:
This is the hint of `removeAll` method in `Set`:

_This implementation determines which is the smaller of this set and the 
specified collection, by invoking the size method on each. If this set has 
fewer elements, then the implementation iterates over this set, checking each 
element returned by the iterator in turn to see if it is contained in the 
specified collection. If it is so contained, it is removed from this set with 
the iterator's remove method. If the specified collection has fewer elements, 
then the implementation iterates over the specified collection, removing from 
this set each element returned by the iterator, using this set's remove method._

That's said, assume that M is the number of elements in the set and N is the 
number of elements in the List, if the type of the specified collection is 
`List`, and M<=N, then the time complexity of `removeAll` is O(MN) (because the 
time complexity of searching in List is O(N)), on the contrary, if N<M, it will 
search in `Set`, the time complexity is O(N).

InĀ 


> Optimize the performance of `Set.removeAll(List)` in 
> `MirrorCheckpointConnector`
> --------------------------------------------------------------------------------
>
>                 Key: KAFKA-15139
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15139
>             Project: Kafka
>          Issue Type: Improvement
>          Components: mirrormaker
>    Affects Versions: 3.5.0
>            Reporter: hudeqi
>            Assignee: hudeqi
>            Priority: Major
>
> This is the hint of `removeAll` method in `Set`:
> _This implementation determines which is the smaller of this set and the 
> specified collection, by invoking the size method on each. If this set has 
> fewer elements, then the implementation iterates over this set, checking each 
> element returned by the iterator in turn to see if it is contained in the 
> specified collection. If it is so contained, it is removed from this set with 
> the iterator's remove method. If the specified collection has fewer elements, 
> then the implementation iterates over the specified collection, removing from 
> this set each element returned by the iterator, using this set's remove 
> method._
> That's said, assume that _M_ is the number of elements in the set and _N_ is 
> the number of elements in the List, if the type of the specified collection 
> is `List`, and {_}M<=N{_}, then the time complexity of `removeAll` is _O(MN)_ 
> (because the time complexity of searching in List is {_}O(N){_}), on the 
> contrary, if {_}N<M{_}, it will search in `Set`, the time complexity is 
> {_}O(N){_}.
> In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly 
> called in a daemon thread. There are two `removeAll` in this method. From a 
> logical point of view, when this method is called in one round, when the 
> number of groups in the source cluster simply increases or decreases, the two 
> `removeAll` execution strategies will always hit the _O(MN)_ situation 
> mentioned above. Therefore, it is better to change all the variables here to 
> Set type to avoid this "low performance".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to