Re: [I] [Bug] ServiceThread#wakeup() can be lost when it races with waitForRunning()'s waitPoint.reset() [rocketmq]

via GitHub Tue, 23 Jun 2026 01:03:47 -0700


RockteMQ-AI commented on issue #10543:
URL: https://github.com/apache/rocketmq/issues/10543#issuecomment-4777039190


   **Issue Evaluation**
   
   Category: `type/bug` | Status: **Confirmed**
   
   The reported race condition in `ServiceThread#wakeup()` / `waitForRunning()` 
has been verified against the current codebase (`develop` branch, commit 
`b5bc1ff`).
   
   **Root Cause Analysis:**
   
   The race window exists between the fast-path CAS failure in 
`waitForRunning()` and the subsequent `waitPoint.reset()`:
   
   1. Thread A (`waitForRunning`): fast-path CAS `hasNotified(true→false)` 
fails → proceeds to `reset()`
   2. Thread B (`wakeup`): CAS `hasNotified(false→true)` succeeds → 
`waitPoint.countDown()` → latch state 1→0
   3. Thread A: `waitPoint.reset()` calls `setState(startCount)` — 
**unconditionally resets state to 1**, discarding the `countDown`
   4. Thread A: `waitPoint.await(interval)` blocks for the full interval 
(default 1000ms)
   
   `CountDownLatch2.reset()` uses an unconditional `setState(startCount)` (not 
CAS), so any prior `countDown()` is lost.
   
   **Impact:** ServiceThread-based components (CommitLog, FlushRealTimeService, 
etc.) may experience up to `interval` ms latency spikes under concurrent wakeup 
pressure. While the system self-heals on the next cycle, this can cause 
periodic tail latency.
   
   **Severity:** Medium — self-healing but causes unnecessary latency spikes.
   
   The suggested fix direction (replacing `CountDownLatch2` with 
`LockSupport.park/unpark`) is sound, as `LockSupport` does not have this 
reset-vs-countDown race.
   
   An automated fix proposal will be generated. Reply `/approve` to proceed 
with PR generation.
   
   ---
   *Automated evaluation by github-manager-bot*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] ServiceThread#wakeup() can be lost when it races with waitForRunning()'s waitPoint.reset() [rocketmq]

Reply via email to