RongtongJin opened a new issue, #9719:
URL: https://github.com/apache/rocketmq/issues/9719

   ### Before Creating the Enhancement Request
   
   - [x] I have confirmed that this should be classified as an enhancement 
rather than a bug/feature.
   
   
   ### Summary
   
   This issue addresses static variable conflicts in metrics management classes 
that occur when running multiple broker instances in a single process 
(BrokerContainer mode). The current implementation uses static variables in 
`RocksDBStoreMetricsManager`, `RemotingMetricsManager`, and 
`PopMetricsManager`, which causes metrics data to be shared across all broker 
instances, leading to incorrect statistics and potential data corruption.
   
   
   ### Motivation
   
   ### Current Problem
   
   In multi-broker scenarios (BrokerContainer mode), multiple broker instances 
run within the same JVM process. The current metrics management implementation 
has several critical issues:
   
   1. **Static Variable Sharing**: `RocksDBStoreMetricsManager`, 
`RemotingMetricsManager`, and `PopMetricsManager` use static variables to store 
metrics data, causing all broker instances to share the same metrics state.
   
   2. **Data Corruption**: When multiple brokers write to the same static 
metrics variables simultaneously, it leads to:
      - Incorrect RPC latency measurements
      - Corrupted Pop message statistics  
      - Inaccurate RocksDB store metrics
      - Race conditions in metrics collection
   
   3. **Monitoring Inaccuracy**: Metrics collected from static variables don't 
reflect individual broker performance, making it impossible to:
      - Monitor per-broker performance
      - Debug issues specific to individual brokers
      - Scale monitoring effectively
   
   4. **Thread Safety Issues**: Concurrent access to static variables without 
proper synchronization can cause:
      - Data races
      - Inconsistent metrics values
      - Potential JVM crashes
   
   ### Example Scenario
   
   ```java
   // Current problematic implementation
   public class RemotingMetricsManager {
       private static LongHistogram rpcLatency; // Shared across all brokers!
       private static ObservableLongGauge writeAndFlushResult; // Shared across 
all brokers!
       
       public static void recordRpcLatency(long latency, Attributes attributes) 
{
           rpcLatency.record(latency, attributes); // All brokers write to same 
histogram
       }
   }
   ```
   
   When Broker A and Broker B both call `recordRpcLatency()`, they modify the 
same static histogram, making it impossible to distinguish between their 
individual performance metrics.
   
   ### Describe the Solution You'd Like
   
   ### Core Implementation Changes
   
   1. **Convert Static Variables to Instance Variables**
      - Transform all static fields in metrics managers to instance fields
      - Each broker instance will have its own metrics manager instance
      - Remove static method dependencies
   
   2. **Instance-Based Metrics Management**
      - `BrokerMetricsManager` creates and manages individual instances of 
metrics managers
      - Each broker gets its own `RemotingMetricsManager`, `PopMetricsManager`, 
and `RocksDBStoreMetricsManager`
      - Metrics are collected and stored per-broker instance
   
   3. **Dependency Injection Pattern**
      - Pass metrics manager instances to classes that need them
      - Update `NettyRemotingAbstract` to accept `RemotingMetricsManager` 
instance
      - Modify processor classes to use instance-based metrics calls
   
   4. **Interface Updates**
      - Add instance-based methods to `RemotingServer` interface
      - Update `NettyRemotingServer` to delegate to instance methods
      - Maintain backward compatibility where possible
   
   ### Technical Implementation Details
   
   **Before (Static Implementation):**
   ```java
   public class RemotingMetricsManager {
       private static LongHistogram rpcLatency;
       private static ObservableLongGauge writeAndFlushResult;
       
       public static void recordRpcLatency(long latency, Attributes attributes) 
{
           rpcLatency.record(latency, attributes);
       }
   }
   
   // Usage in processors
   RemotingMetricsManager.recordRpcLatency(latency, attributes);
   ```
   
   **After (Instance-Based Implementation):**
   ```java
   public class RemotingMetricsManager {
       private LongHistogram rpcLatency;
       private ObservableLongGauge writeAndFlushResult;
       
       public void recordRpcLatency(long latency, Attributes attributes) {
           rpcLatency.record(latency, attributes);
       }
   }
   
   // Usage in processors
   brokerController.getBrokerMetricsManager()
       .getRemotingMetricsManager()
       .recordRpcLatency(latency, attributes);
   ```
   
   **BrokerMetricsManager Integration:**
   ```java
   public class BrokerMetricsManager {
       private RemotingMetricsManager remotingMetricsManager;
       private PopMetricsManager popMetricsManager;
       private RocksDBStoreMetricsManager rocksDBStoreMetricsManager;
       
       public BrokerMetricsManager() {
           this.remotingMetricsManager = new RemotingMetricsManager();
           this.popMetricsManager = new PopMetricsManager();
           this.rocksDBStoreMetricsManager = new RocksDBStoreMetricsManager();
       }
       
       public RemotingMetricsManager getRemotingMetricsManager() {
           return remotingMetricsManager;
       }
   }
   ```
   
   ### Expected Benefits
   
   - **Accurate Metrics**: Each broker instance maintains its own metrics data
   - **Thread Safety**: Eliminates race conditions from shared static variables
   - **Better Monitoring**: Enables per-broker performance monitoring and 
debugging
   - **Scalability**: Supports unlimited number of broker instances per process
   - **Data Integrity**: Prevents metrics corruption in multi-broker scenarios
   
   
   ### Describe Alternatives You've Considered
   
   ### Alternative 1: Thread-Local Storage
   - **Pros**: Minimal code changes, maintains static API
   - **Cons**: Doesn't solve the fundamental issue of shared state, complex 
cleanup, memory leaks potential
   
   ### Alternative 2: Synchronized Static Methods
   - **Pros**: Simple implementation, maintains existing API
   - **Cons**: Performance bottleneck, doesn't provide per-broker isolation, 
still shared state
   
   ### Alternative 3: Metrics Namespacing with Static Variables
   - **Pros**: Maintains static API, adds broker identification
   - **Cons**: Complex implementation, doesn't solve thread safety, still 
shared storage
   
   ### Alternative 4: Global Registry Pattern
   - **Pros**: Centralized management, maintains some static access
   - **Cons**: Still has shared state issues, complex lifecycle management, 
potential memory leaks
   
   **Selected Solution**: Instance-based approach provides the cleanest 
separation of concerns, eliminates all shared state issues, and provides the 
best foundation for future enhancements.
   
   ### Additional Context
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to