Aias00 opened a new pull request, #3439:
URL: https://github.com/apache/dubbo-go/pull/3439

   ## Problem
   
   `StickyInvoker` in `BaseClusterInvoker` was a plain `base.Invoker` field 
read and written by multiple goroutines without any synchronization, causing a 
**data race** on the RPC hot path.
   
   ### Race details
   
   - **Write**: `DoSelect()` sets `StickyInvoker = nil` (line 104) and 
`StickyInvoker = selectedInvoker` (line 115)
   - **Read**: `DoSelect()` reads `StickyInvoker` multiple times (lines 103, 
108-110), and `IsAvailable()` reads it (lines 65-66)
   - **Concurrent scenario**: `failbackClusterInvoker` calls `DoSelect()` from 
both the main `Invoke()` goroutine (line 143) and the retry goroutine spawned 
via `go invoker.tryTimerTaskProc()` (line 118, calling `DoSelect` at line 85). 
Even without failback, concurrent `Invoke()` calls on the same invoker would 
also race.
   
   ## Fix
   
   Replace the unexported `StickyInvoker` field with `sync/atomic.Value`, 
accessed via `getStickyInvoker()`/`setStickyInvoker()` helpers:
   
   | Change | Description |
   |--------|-------------|
   | `StickyInvoker base.Invoker` → `stickyInvoker atomic.Value` | Field uses 
atomic.Value, unexported to prevent direct access |
   | Add `getStickyInvoker()` / `setStickyInvoker()` | Lock-free read/write 
wrapper methods |
   | `IsAvailable()` uses `getStickyInvoker()` | Eliminates read race |
   | `DoSelect()` uses local variable + atomic accessors | Eliminates 
read/write race; local variable ensures consistent logic within one call |
   | Add `TestStickyConcurrentDoSelect` | 100 goroutines concurrently call 
`DoSelect` with sticky=true |
   | Add `TestStickyConcurrentIsAvailableAndDoSelect` | 100 goroutines 
concurrently read sticky invoker while 100 goroutines call `DoSelect` |
   
   Both new tests pass with `go test -race`.
   
   ## Semantic note
   
   The sticky invoker mechanism is inherently "best-effort" — concurrent 
goroutines may select different invokers in the same call, but this is 
acceptable since sticky is merely a preference, not a correctness guarantee. 
The `atomic.Value` approach eliminates the **undefined behavior** (Go data 
race) while preserving the sticky semantics.
   
   ## Test results
   
   ```
   $ go test ./cluster/cluster/base/... -race -count=1 -v
   === RUN   TestStickyNormal
   --- PASS
   === RUN   TestStickyNormalWhenError
   --- PASS
   === RUN   TestStickyConcurrentDoSelect
   --- PASS
   === RUN   TestStickyConcurrentIsAvailableAndDoSelect
   --- PASS
   PASS
   ```
   
   All other cluster package tests also pass with `-race` (the pre-existing 
`TestFailbackOutOfLimit` race on the `ticker` field in failback is unrelated).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to