hudeqi opened a new pull request, #13946:
URL: https://github.com/apache/kafka/pull/13946

   ### Motivation
   This PR is inspired by 
https://github.com/apache/kafka/pull/13913#discussion_r1248010145
   
   This is the hint of `removeAll` method in `Set`:
   _This implementation determines which is the smaller of this set and the 
specified collection, by invoking the size method on each. If this set has 
fewer elements, then the implementation iterates over this set, checking each 
element returned by the iterator in turn to see if it is contained in the 
specified collection. If it is so contained, it is removed from this set with 
the iterator's remove method. If the specified collection has fewer elements, 
then the implementation iterates over the specified collection, removing from 
this set each element returned by the iterator, using this set's remove method._
   
   That's said, assume that M is the number of elements in the set and N is the 
number of elements in the List, if the type of the specified collection is 
`List`, and M<=N, then the time complexity of `removeAll` is O(MN) (because the 
time complexity of searching in List is O(N)), on the contrary, if N<M, it will 
search in `Set`, the time complexity is O(N).
   
   In `MirrorCheckpointConnector`, `refreshConsumerGroups` method is repeatedly 
called in a daemon thread. There are two `removeAll` in this method. From a 
logical point of view, when this method is called in one round, when the number 
of groups in the source cluster simply increases or decreases, the two 
`removeAll` execution strategies will always hit the O(MN) situation mentioned 
above. 
   
   ### Solution
   Therefore, it is better to change all the variables here to Set type to 
avoid this "low performance".
   
   This PR has passed the unit test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to