Aias00 opened a new pull request, #3439: URL: https://github.com/apache/dubbo-go/pull/3439
## Problem `StickyInvoker` in `BaseClusterInvoker` was a plain `base.Invoker` field read and written by multiple goroutines without any synchronization, causing a **data race** on the RPC hot path. ### Race details - **Write**: `DoSelect()` sets `StickyInvoker = nil` (line 104) and `StickyInvoker = selectedInvoker` (line 115) - **Read**: `DoSelect()` reads `StickyInvoker` multiple times (lines 103, 108-110), and `IsAvailable()` reads it (lines 65-66) - **Concurrent scenario**: `failbackClusterInvoker` calls `DoSelect()` from both the main `Invoke()` goroutine (line 143) and the retry goroutine spawned via `go invoker.tryTimerTaskProc()` (line 118, calling `DoSelect` at line 85). Even without failback, concurrent `Invoke()` calls on the same invoker would also race. ## Fix Replace the unexported `StickyInvoker` field with `sync/atomic.Value`, accessed via `getStickyInvoker()`/`setStickyInvoker()` helpers: | Change | Description | |--------|-------------| | `StickyInvoker base.Invoker` → `stickyInvoker atomic.Value` | Field uses atomic.Value, unexported to prevent direct access | | Add `getStickyInvoker()` / `setStickyInvoker()` | Lock-free read/write wrapper methods | | `IsAvailable()` uses `getStickyInvoker()` | Eliminates read race | | `DoSelect()` uses local variable + atomic accessors | Eliminates read/write race; local variable ensures consistent logic within one call | | Add `TestStickyConcurrentDoSelect` | 100 goroutines concurrently call `DoSelect` with sticky=true | | Add `TestStickyConcurrentIsAvailableAndDoSelect` | 100 goroutines concurrently read sticky invoker while 100 goroutines call `DoSelect` | Both new tests pass with `go test -race`. ## Semantic note The sticky invoker mechanism is inherently "best-effort" — concurrent goroutines may select different invokers in the same call, but this is acceptable since sticky is merely a preference, not a correctness guarantee. The `atomic.Value` approach eliminates the **undefined behavior** (Go data race) while preserving the sticky semantics. ## Test results ``` $ go test ./cluster/cluster/base/... -race -count=1 -v === RUN TestStickyNormal --- PASS === RUN TestStickyNormalWhenError --- PASS === RUN TestStickyConcurrentDoSelect --- PASS === RUN TestStickyConcurrentIsAvailableAndDoSelect --- PASS PASS ``` All other cluster package tests also pass with `-race` (the pre-existing `TestFailbackOutOfLimit` race on the `ticker` field in failback is unrelated). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
