NeverENG opened a new pull request, #3373:
URL: https://github.com/apache/dubbo-go/pull/3373

   ### Description
   Fixes # issue3354
   
   ## What this PR does
   
   Fixes #3354. Hardens application-level **service-app mapping** (`interface 
-> app names`)
   registration so it is correct under concurrent providers, and gives it a 
proper retry policy.
   完善应用级 service-app mapping 的写入一致性、重试与去重。
   
   ## Background
   
   The mapping value is a comma-separated set of application names stored under 
a single
   interface key, shared by all providers of that interface. Registration is 
therefore a
   read-modify-write, and the previous implementations had several reliability 
gaps.
   
   ## Changes
   
   ### 1. Optimistic concurrency across all backends (no more lost updates)
   Concurrent appends no longer clobber each other:
   - **etcd**: `Get`+`Put` → `GetValAndRev` + `UpdateWithRev` (CAS on 
`ModRevision`), `Create` for first write.
   - **zookeeper**: keeps versioned `SetContent`, now surfaces version 
conflicts instead of swallowing them.
   - **nacos**: adds `CasMd5` optimistic lock.
   
   Each backend wraps its native conflict (`ErrCompareFail` / `ErrBadVersion` / 
`ErrNodeExists` /
   nacos publish failure) into a shared `report.ErrMappingCASConflict` sentinel 
via `%w`.
   
   ### 2. Graded retry (was: fixed loop, no backoff)
   `registerWithRetry` retries **only** CAS conflicts (`errors.Is`) with 
exponential backoff + jitter,
   and returns permanent errors (network/auth) immediately instead of burning 
the whole retry budget.
   原来任何错误都空转重试 10 次且无 sleep,现在按错误类型分级重试。
   
   ### 3. Extract shared logic + fix two hidden bugs
   - `report.MergeServiceAppMapping`: whole-element dedup. Fixes the 
`strings.Contains` substring
     false positive (registering `order` was wrongly treated as present when 
`order-service` existed)
     and the leading-comma bug (`"" + "," + app` → `",app"`).
   - `report.DecodeServiceAppNames`: parse into a set, skipping empty elements.
   
   ### 4. Listener cleanup
   - **zookeeper**: implemented removal via `CacheListener.RemoveKeyListeners` 
(was a silent `return nil`
     that leaked listeners).
   - **etcd**: documents the mapping listener as unsupported instead of 
silently succeeding.
   
   ### 5. Tests
   - Unit tests for the merge/decode helpers (incl. the substring and 
empty-value regressions).
   - A concurrency test that **reproduces the lost-update bug** with the naive 
read-modify-write and
     **proves CAS preserves every writer** (200 writers / 20 concurrent 
readers). Passes under `-race`.
   
   ## Known limitation (documented in code)
   
   Nacos `CasMd5` is an optimistic **UPDATE** and cannot guard the first 
**INSERT** (Nacos has no
   create-if-absent primitive), so the initial concurrent registration of a 
brand-new interface can
   still race. **etcd and zookeeper are not affected.** Left as a documented 
limitation; can be revisited
   if Nacos exposes a SETNX-style primitive.
   
   ## Test
   
   go test -race ./metadata/report/... ./metadata/mapping/...
   
   ### Checklist
   - [x] I confirm the target branch is `develop`
   - [x] Code has passed local testing
   - [x] I have added tests that prove my fix is effective or that my feature 
works
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to