RongtongJin opened a new issue, #9721:
URL: https://github.com/apache/rocketmq/issues/9721
### Before Creating the Enhancement Request
- [x] I have confirmed that this should be classified as an enhancement
rather than a bug/feature.
### Summary
When config timerEnableRetryUntilSuccess is true, the
`TimerDequeueGetService` threads are not exiting properly after shutdown due to
an infinite `while(true)` loop in the `checkDequeueLatch` method. This causes
resource leaks and prevents clean shutdown of the timer message store.
### Motivation
## Problem Description
### Issue
When `TimerDequeueGetService` calls `shutdown()`, the service threads do not
exit gracefully. The threads remain blocked in the `checkDequeueLatch` method,
preventing proper cleanup and resource release.
### Root Cause
The `checkDequeueLatch` method in `TimerMessageStore` contains an infinite
`while(true)` loop that continuously waits for `CountDownLatch` completion:
```java
public void checkDequeueLatch(CountDownLatch latch, long delayedTime) throws
Exception {
if (latch.await(1, TimeUnit.SECONDS)) {
return;
}
int checkNum = 0;
while (true) { // <-- This loop never checks if service is stopping
if (dequeuePutQueue.size() > 0
|| !checkStateForGetMessages(AbstractStateService.WAITING)
|| !checkStateForPutMessages(AbstractStateService.WAITING)) {
//let it go
} else {
checkNum++;
if (checkNum >= 2) {
break;
}
}
if (latch.await(1, TimeUnit.SECONDS)) {
break;
}
}
// ... rest of method
}
```
### Impact
- **Resource Leaks**: Threads remain active after shutdown, consuming system
resources
- **Clean Shutdown Failure**: Broker shutdown process may hang or timeout
- **Memory Leaks**: Thread-local variables and associated objects are not
properly cleaned up
- **Monitoring Issues**: Thread count remains elevated even after service
shutdown
- **Production Issues**: Can cause problems in containerized environments
and orchestration systems
### Reproduction Steps
1. Start a RocketMQ broker with timer message store enabled
2. Send some delayed messages to trigger timer processing
3. Call `shutdown()` on the broker
4. Observe that `TimerDequeueGetService` threads do not exit
5. Check thread dumps to confirm threads are blocked in `checkDequeueLatch`
### Expected Behavior
- `TimerDequeueGetService` threads should exit gracefully when `shutdown()`
is called
- All resources should be properly cleaned up
- Thread count should return to baseline after shutdown
### Actual Behavior
- Threads remain blocked in `checkDequeueLatch` method
- Resources are not released
- Thread count remains elevated
- Shutdown process may hang
### Describe the Solution You'd Like
Check the status in the while loop and break out of the loop.
### Describe Alternatives You've Considered
No
### Additional Context
### Related Components
- `TimerMessageStore.checkDequeueLatch()`
- `TimerDequeueGetService.run()`
- `TimerMessageStore.dequeue()`
### Thread Dump Analysis
When the issue occurs, thread dumps show threads blocked like this:
```
"TimerDequeueGetService" #123 daemon prio=5 os_prio=0 tid=0x... nid=0x...
waiting on condition
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
at
org.apache.rocketmq.store.timer.TimerMessageStore.checkDequeueLatch(TimerMessageStore.java:966)
at
org.apache.rocketmq.store.timer.TimerMessageStore.dequeue(TimerMessageStore.java:1058)
at
org.apache.rocketmq.store.timer.TimerMessageStore$TimerDequeueGetService.run(TimerMessageStore.java:1518)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]