XnLemon opened a new pull request, #3369: URL: https://github.com/apache/dubbo-go/pull/3369
### Description Fixes #3352 --- #### Background / 背景 The [#2534 refactor](https://github.com/apache/dubbo-go/pull/2534) centralized metadata logic and settled on an application-level metadata path where `MetadataReport` handles app-level `MetadataInfo` and service-app mapping. That refactor was correct in intent, but it left the **selection semantics of `MetadataReport` under-specified** for multi-registry and multi-instance deployments. Concretely, when a process connects to more than one registry (e.g. Nacos + Zookeeper), each registry gets its own `MetadataReport` instance, keyed by `registryId` in the `instances` map. The pre-existing code had four independent correctness gaps in how it chose which report to use. #2534 重构将 metadata 逻辑集中化,以应用级 `MetadataInfo` 和 service-app mapping 为核心路径。重构方向正确,但在**多注册中心和多实例部署场景下,`MetadataReport` 的选择语义存在空白**。 具体来说,当进程同时连接多个注册中心(如 Nacos + Zookeeper)时,每个注册中心会在 `instances` map 中保有独立的 `MetadataReport` 实例。原有代码在选择使用哪个 report 时存在以下四个独立的正确性问题。 --- #### Problems Fixed / 修复的问题 **1. `GetMetadataReport()` returned a non-deterministic report / 返回结果不确定** Go map iteration order is randomized. `GetMetadataReport()` returned whichever entry happened to come first — a different report on every call in some runs, making any downstream behavior non-reproducible. Fix: prefer the `"default"` key; fall back to the lexicographically first `registryId` so the result is always stable. Go map 迭代顺序随机,`GetMetadataReport()` 每次可能返回不同的 report。修复:优先返回 `"default"` 键;不存在时退回到字典序最小的 `registryId`,确保结果稳定。 --- **2. `GetMetadataReportByRegistry()` silently fell back to the wrong registry's report / 静默返回错误注册中心的 report** When a caller passed a specific `registryId` that was not registered, the function silently fell back to `GetMetadataReport()` and returned an unrelated registry's report. In `GetMetadataFromMetadataReport`, this meant remote metadata could be fetched from the wrong backend with no error surfaced. Fix: return `nil` + `Warnf` when a specific registryId is not found. The existing nil-guard in `GetMetadataFromMetadataReport` already surfaces this as an actionable error. 调用方传入未注册的 `registryId` 时,函数静默退回到其他注册中心的 report,且无任何报错。修复:找不到时返回 `nil` + `Warnf`,由调用方的 nil 检查给出明确错误。 --- **3. `ServiceNameMapping.Remove()` only returned the last error / 只保留最后一个错误** `Remove()` fanned out to all reports but used a `lastErr` accumulator overwritten on each iteration. If the first report failed and the second succeeded, the error was silently discarded and the caller saw `nil`, leaving a stale mapping in the first registry. Fix: collect all errors with `errors.Join` so the caller receives the full failure picture. `Remove()` 向所有 report 扇出,但 `lastErr` 在每次循环中被覆盖。若第一个 report 失败、第二个成功,错误被静默丢弃。修复:使用 `errors.Join` 收集全部错误。 --- **4. Revision calculation and remote metadata fetch used cross-registry data / Revision 计算和远端 metadata 拉取使用了跨注册中心数据** The revision customizers called the global `GetMetadataService()` which merges services across all registries, so two instances on different registries could compute the same revision even though their actual service sets differed — breaking consumer-side change detection. The listener cache in `GetMetadataInfo()` was keyed by `revision` alone. Two registries producing the same revision string would collide, and the second registry's listener would silently receive the first registry's cached metadata. Fix: thread `registryId` through `createInstance()`, both revision customizers, `GetMetadataFromMetadataReport()`, `NewServiceInstancesChangedListener()`, and `GetMetadataInfo()`. Change the cache key to `registryId + ":" + revision`. Revision customizer 通过全局 `GetMetadataService()` 获取服务列表,合并了所有注册中心的数据,导致不同注册中心的实例可能计算出相同的 revision。监听器的 metadata 缓存仅以 `revision` 为键,多注册中心场景下会发生缓存污染。修复:将 `registryId` 贯穿传递到相关链路,缓存键改为 `registryId + ":" + revision`。 --- #### Changes / 改动内容 | File | Change | |---|---| | `metadata/report_instance.go` | `GetMetadataReport()` deterministic; `GetMetadataReportByRegistry()` returns nil for unknown id; add `ClearMetadataReportInstances()` for test isolation | | `metadata/client.go` | `GetMetadataFromMetadataReport()` takes `registryId`, routes to correct per-registry report | | `metadata/mapping/metadata/service_name_mapping.go` | `Remove()` collects all errors with `errors.Join` | | `registry/servicediscovery/service_instances_changed_listener_impl.go` | `GetMetadataInfo()` and `NewServiceInstancesChangedListener()` take `registryId`; cache key scoped to `registryId + ":" + revision` | | `registry/servicediscovery/service_discovery_registry.go` | `createInstance()` injects `registryId` into instance metadata; `SubscribeURL` passes `registryId` to listener | | `registry/servicediscovery/customizer/service_revision_customizer.go` | Read per-registry `MetadataInfo`; upgrade missing-registryId log to `Errorf` with accurate consequence description | | `registry/servicediscovery/customizer/metadata_service_url_params_customizer.go` | Restore TODO with explanation — per-registry MetadataService URL support is tracked separately | --- #### Tests Added / 新增测试 - `TestGetMetadataReportIsDeterministic` — 20 iterations verify stability under Go's randomized map iteration - `TestGetMetadataReportByRegistry` / `TestGetMetadataReportByRegistryFallsBackDeterministically` — hit, miss, and empty-string paths - `TestServiceNameMappingRemoveFansOutToAllReports` — both reports called on success - `TestServiceNameMappingRemoveCollectsAllErrors` — both errors visible via `errors.Is` in joined return - `TestServiceNameMappingRemoveContinuesAfterPartialFailure` — loop does not short-circuit on first failure - `TestExportedRevisionIsRegistryScoped` / `TestSubscribedRevisionIsRegistryScoped` — different registries produce different revisions - `TestExportedRevisionMissingRegistryIdYieldsZero` / `TestSubscribedRevisionMissingRegistryIdYieldsZero` — missing `RegistryIdKey` yields revision `"0"`, does not borrow another registry's data - `TestGetMetadataFromMetadataReport` (rewritten) — specific registryId routes to correct report; unknown registryId returns error, not wrong report - `TestListenerUsesRegistryIdToFetchRemoteMetadata` — end-to-end: registryId flows from `NewServiceInstancesChangedListener` through `OnEvent` → `GetMetadataInfo` → `GetMetadataFromMetadataReport` → correct mock report called --- #### TODO / 本 PR 还未做完 Multi-instance metadata service URL alignment (`metadata_service_url_params_customizer.go`): `GetMetadataService()` still returns a global singleton so all instances across registries receive the same metadata service URL. 多实例 metadata service URL 对齐:`GetMetadataService()` 仍返回全局单例,所有注册中心实例获得同一 metadata service URL。 --- ### Checklist - [X] I confirm the target branch is `develop` - [X] Code has passed local testing - [X] I have added tests that prove my fix is effective or that my feature works -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
