[I] [Enhancement] Improving Broker Performance: Optimizing the whichTopicByConsumer Method Query Efficiency [rocketmq]

via GitHub Wed, 12 Mar 2025 19:47:37 -0700


JasirVoriya opened a new issue, #9240:
URL: https://github.com/apache/rocketmq/issues/9240


   ### Before Creating the Enhancement Request
   
   - [x] I have confirmed that this should be classified as an enhancement 
rather than a bug/feature.
   
   
   ### Summary
   
   The current implementation of the method `whichTopicByConsumer` is as 
follows:
   
   ```java
   public Set<String> whichTopicByConsumer(final String group) {
       Set<String> topics = new HashSet<>();
   
       Iterator<Entry<String, ConcurrentMap<Integer, Long>>> it = 
this.offsetTable.entrySet().iterator();
       while (it.hasNext()) {
           Entry<String, ConcurrentMap<Integer, Long>> next = it.next();
           String topicAtGroup = next.getKey();
           String[] arrays = topicAtGroup.split(TOPIC_GROUP_SEPARATOR);
           if (arrays.length == 2) {
               if (group.equals(arrays[1])) {
                   topics.add(arrays[0]);
               }
           }
       }
   
       return topics;
   }
   ```
   This method is used to query all topics under a specified group. However, it 
requires iterating through the entire offsetTable, and the advantages of the 
hash table are not utilized at all. The query efficiency is far too slow! As a 
result, a large portion of my CPU resources is being consumed by String.split.
   
   Below is the CPU flame graph. Although it is from an older version of 
RocketMQ, I have reviewed the code in the latest version, and this part still 
hasn't been optimized.
   
![Image](https://github.com/user-attachments/assets/61024c37-e178-454b-9e6a-50205ae5ddbb)
   
   ### Motivation
   
   I need to periodically collect the offset progress of groups every minute, 
which involves calling the inefficient `whichTopicByConsumer` method. Such an 
inefficient query can be completely avoided. Simply querying the topics 
corresponding to a single group shouldn't require traversing all the data—this 
is unreasonable.
   
   ### Describe the Solution You'd Like
   
   I would like the `whichTopicByConsumer` method to be optimized to avoid the 
need for a full traversal of the `offsetTable` when querying the topics 
corresponding to a specific group. A more efficient approach would involve 
introducing a pre-built mapping (e.g., a hash table or a similar indexed data 
structure) that directly maps consumer groups to their associated topics. This 
mapping can be maintained dynamically and updated whenever changes occur in the 
`offsetTable`, ensuring consistency.
   
   With this solution, the query logic can simply perform a direct lookup in 
the mapping, reducing the complexity from **O(N)** (full traversal) to **O(1)** 
or **O(K)**, where K is the number of topics associated with the group. This 
would significantly improve query efficiency and reduce CPU usage caused by 
repetitive and expensive operations like `String.split`.
   
   By implementing this optimization, the system would handle periodic offset 
progress collection much more efficiently, especially in scenarios where the 
number of groups and topics grows large.
   
   ### Describe Alternatives You've Considered
   
   One approach is to manually pass in the specified topic when calling the 
`getConsumeStats` method. This way, the execution of `whichTopicByConsumer` can 
be avoided. However, the inefficiency of `whichTopicByConsumer` still remains 
as an issue.
   
   ### Additional Context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Enhancement] Improving Broker Performance: Optimizing the whichTopicByConsumer Method Query Efficiency [rocketmq]

Reply via email to