RongtongJin opened a new issue, #9721:
URL: https://github.com/apache/rocketmq/issues/9721

   ### Before Creating the Enhancement Request
   
   - [x] I have confirmed that this should be classified as an enhancement 
rather than a bug/feature.
   
   
   ### Summary
   
   When config timerEnableRetryUntilSuccess is true, the 
`TimerDequeueGetService` threads are not exiting properly after shutdown due to 
an infinite `while(true)` loop in the `checkDequeueLatch` method. This causes 
resource leaks and prevents clean shutdown of the timer message store.
   
   
   
   ### Motivation
   
   ## Problem Description
   
   ### Issue
   When `TimerDequeueGetService` calls `shutdown()`, the service threads do not 
exit gracefully. The threads remain blocked in the `checkDequeueLatch` method, 
preventing proper cleanup and resource release.
   
   ### Root Cause
   The `checkDequeueLatch` method in `TimerMessageStore` contains an infinite 
`while(true)` loop that continuously waits for `CountDownLatch` completion:
   
   ```java
   public void checkDequeueLatch(CountDownLatch latch, long delayedTime) throws 
Exception {
       if (latch.await(1, TimeUnit.SECONDS)) {
           return;
       }
       int checkNum = 0;
       while (true) {  // <-- This loop never checks if service is stopping
           if (dequeuePutQueue.size() > 0
               || !checkStateForGetMessages(AbstractStateService.WAITING)
               || !checkStateForPutMessages(AbstractStateService.WAITING)) {
               //let it go
           } else {
               checkNum++;
               if (checkNum >= 2) {
                   break;
               }
           }
           if (latch.await(1, TimeUnit.SECONDS)) {
               break;
           }
       }
       // ... rest of method
   }
   ```
   
   ### Impact
   - **Resource Leaks**: Threads remain active after shutdown, consuming system 
resources
   - **Clean Shutdown Failure**: Broker shutdown process may hang or timeout
   - **Memory Leaks**: Thread-local variables and associated objects are not 
properly cleaned up
   - **Monitoring Issues**: Thread count remains elevated even after service 
shutdown
   - **Production Issues**: Can cause problems in containerized environments 
and orchestration systems
   
   ### Reproduction Steps
   1. Start a RocketMQ broker with timer message store enabled
   2. Send some delayed messages to trigger timer processing
   3. Call `shutdown()` on the broker
   4. Observe that `TimerDequeueGetService` threads do not exit
   5. Check thread dumps to confirm threads are blocked in `checkDequeueLatch`
   
   ### Expected Behavior
   - `TimerDequeueGetService` threads should exit gracefully when `shutdown()` 
is called
   - All resources should be properly cleaned up
   - Thread count should return to baseline after shutdown
   
   ### Actual Behavior
   - Threads remain blocked in `checkDequeueLatch` method
   - Resources are not released
   - Thread count remains elevated
   - Shutdown process may hang
   
   ### Describe the Solution You'd Like
   
   Check the status in the while loop and break out of the loop.
   
   ### Describe Alternatives You've Considered
   
   No
   
   ### Additional Context
   
   ### Related Components
   - `TimerMessageStore.checkDequeueLatch()`
   - `TimerDequeueGetService.run()`
   - `TimerMessageStore.dequeue()`
   
   ### Thread Dump Analysis
   When the issue occurs, thread dumps show threads blocked like this:
   ```
   "TimerDequeueGetService" #123 daemon prio=5 os_prio=0 tid=0x... nid=0x... 
waiting on condition
      java.lang.Thread.State: TIMED_WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)
      at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
      at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
      at 
org.apache.rocketmq.store.timer.TimerMessageStore.checkDequeueLatch(TimerMessageStore.java:966)
      at 
org.apache.rocketmq.store.timer.TimerMessageStore.dequeue(TimerMessageStore.java:1058)
      at 
org.apache.rocketmq.store.timer.TimerMessageStore$TimerDequeueGetService.run(TimerMessageStore.java:1518)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to