mochengqian opened a new issue, #955:
URL: https://github.com/apache/dubbo-go-pixiu/issues/955
## Background
Follow-up to #932 (issue #905 step 5).
The request-path pick verifies that a balancer-returned endpoint is still in
the snapshot's healthy set via `healthyEndpointFromSnapshot` in
`pkg/cluster/loadbalancer/load_balancer.go`:
```go
func healthyEndpointFromSnapshot(endpoint *model.Endpoint, healthyEndpoints
[]*model.Endpoint) *model.Endpoint {
if endpoint == nil {
return nil
}
for _, candidate := range healthyEndpoints { // ① pointer-equality
fast path
if candidate == endpoint {
return model.CloneEndpoint(candidate)
}
}
for _, candidate := range healthyEndpoints { // ②
sameEndpointIdentity slow path
if sameEndpointIdentity(candidate, endpoint) {
return model.CloneEndpoint(candidate)
}
}
return nil
}
```
## Problem
Both loops are O(N) over the healthy set. The intent of #932 was that
zero-copy balancers hit the pointer fast path (①) early. But:
- **Non-zero-copy snapshot balancers** go through
`defensiveSnapshotPickContext`, which clones `HealthyEndpoints`. The balancer
returns an element of the *cloned* slice, so its pointer never equals any
element of `context.HealthyEndpoints`. Loop ① misses every element and the pick
falls through to the full O(N) `sameEndpointIdentity` scan (②), which also runs
`SocketAddress.Equal` + domain checks per element.
- Even for zero-copy balancers, ① is still a linear scan (cheap per-element
pointer compare, but O(N) on average N/2 probes).
For large clusters (hundreds of endpoints) under high QPS this is wasted
hot-path work.
## Goal
Make the recheck O(1) **without weakening the trust-boundary validation**
that #932 intentionally keeps (the `sameEndpointIdentity` rules, including the
ID-keyed placeholder wildcard).
## Suggested approach
- The snapshot already maintains a `healthyEndpointByID` index. Expose an
O(1) lookup so `healthyEndpointFromSnapshot` can resolve by ID and then apply
the existing `sameEndpointIdentity` rule against the single indexed candidate,
instead of scanning.
- Preserve current behavior for the empty-ID / placeholder-address cases
(rules 1 and 2 of `sameEndpointIdentity`).
- Keep returning `nil` when the balancer returns an object not present in
the healthy set (do **not** blindly `CloneEndpoint(endpoint)` and skip
validation — that defense is deliberate).
## Caveat (record honestly)
`model.CloneEndpoint` allocations likely dominate the per-pick cost over the
recheck scan, so the absolute win may be modest. Benchmark before committing to
confirm the recheck is actually material relative to clone cost. If clone
dominates, this may be lower priority than #940 (avoid double-cloning).
## Test plan
- [ ] Benchmark `PickEndpoint` for zero-copy and non-zero-copy snapshot
balancers across endpoints=8/64/512, before/after.
- [ ] Confirm `sameEndpointIdentity` semantics unchanged: ID-match,
blank-domain placeholder wildcard, and reject-on-not-found all still covered by
tests.
- [ ] `go test -race ./pkg/cluster/loadbalancer/... ./pkg/server/...`
## Files
- `pkg/cluster/loadbalancer/load_balancer.go`
- `pkg/cluster/cluster.go` (expose `healthyEndpointByID` lookup)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]