Re: [D] [AI] AI gateway milestones [dubbo-go-pixiu]

via GitHub Wed, 03 Jun 2026 03:51:00 -0700


GitHub user mochengqian edited a comment on the discussion: [AI] AI gateway 
milestones


** @zerui  让我学习pixiu竞品的优秀cluster设计.** 下面的你们看看合理、需要吗 @Alanxtl @AlexStocks 
**我的prompt:**“我的目标是深度研究 Apache dubbo-go-pixiu 的“集群模块”优化方向。请你不要只做泛泛竞品分析，而是以“如何让 
Kong / APISIX / Envoy AI Gateway / Higress / LiteLLM / Portkey / TrueFoundry 
等成熟 AI 智能网关的高频重度用户，看到 Pixiu 的集群模块后愿意称赞并迁移”为最终目标，进行系统性研究。”
**得出优化建议:** 把 Pixiu cluster 从"静态配置的 upstream 容器"升级成"可编程的多协议统一运行时"——让 AI 
provider、Dubbo service、K8s pod、MCP tool  这些异构后端都先编译成统一的 immutable snapshot + 
mutable state，然后用同一套 picker/scorer/balancer 做智能选址（cost-aware / quota-aware / 
failure-classified），最终实现"一次编写负载均衡逻辑，到处复用"。
# [RFC] 统一多协议 Cluster Runtime：从 Upstream 容器到可编程运行时

> 基于《Pixiu 集群模块深度研究报告》与 PR #932 现状，提出将 cluster 从"通用 upstream 
> 容器"演进为"多协议、多形态后端统一运行时"的完整技术路线。本 RFC 先在 Discussion #696 试水，社区有兴趣后提交为正式 issue。

## 执行摘要（30 秒版）

**现状**：cluster 是静态配置的 upstream 容器，AI 失败状态（cooldown）活在 filter 的全局 map 里，picker 
看不到；Dubbo/K8s/MCP 各自独立实现，无法复用选址逻辑。

**目标**：把 cluster 升级成**统一运行时** —— 静态 cluster、Dubbo service cluster、Kubernetes 
service cluster、AI provider cluster、MCP service cluster，全部先编译成 immutable 
runtime snapshot，再由统一 selector/scorer/picker 执行选址。

**路径**：三层渐进演进（每层可独立验收、向后兼容）
1. **State 基底**（Phase 1）— per-endpoint 
`EndpointState`（cooldown/inflight/failure-class），解决 #696 的 key-level cooldown 问题
2. **AI Runtime 填满**（Phase 2）— cost/quota/model-capability 进 
state，scorer/semantic router 进 picker
3. **Multi-Kind 统一**（Phase 3）— `ClusterKind` 接口，Dubbo/K8s/MCP 各实现 
`CompileSnapshot`，picker 只看统一 snapshot

**关键设计点**：immutable snapshot（沿用 #932）+ stable per-endpoint state（跨 snapshot 
存活、atomic 更新）—— 解决"快变状态 vs 零分配读路径"的张力。

---

## 动机：为什么要动 cluster core？

### 当前痛点（Discussion #696 + Issue #905）

1. **AI 失败状态不可见**  
   `sharedCooldownStore` 在 filter 
侧（`pkg/filter/llm/proxy/filter.go:138`），picker 看不到 cooldown。单个 API key 触发 
429，整个 endpoint 被 filter 循环跳过，但对 picker 来说 endpoint 仍"健康" —— 健康与可用脱节。

2. **二元健康无法表达失败语义**  
   `context_length_exceeded` 是请求/endpoint **不匹配**（该请求需要更大 context 模型），不是 
endpoint 故障 —— 正确反应是路由到大 context endpoint，而非 cooldown。Binary healthy/unhealthy 
表达不了这点。

3. **多协议后端各自为政**  
   静态 upstream（现有）、Dubbo service（规划中）、K8s service（规划中）、MCP tool（规划中）各自独立实现，无法共享 
LB/健康检查/选址逻辑。每加一种后端就复制一套代码。

### 为什么要统一？（报告的核心洞察）

**观察**：AI provider、Dubbo service、K8s pod、MCP tool —— 表面看完全不同，但**选址决策的输入是同构的**：
- 都有"成员列表"（endpoints / services / pods / tools）
- 都有"健康状态"（provider 429 / service down / pod terminating / tool unavailable）
- 都有"负载指标"（inflight / qps / cpu / concurrent calls）
- 都有"能力标签"（model supports / service version / pod labels / tool schema）

**结论**：如果把这些异构后端先**编译**成统一的 runtime view（immutable snapshot + mutable 
state），picker 就可以**一次编写、到处复用**。这是 Envoy `Host` 抽象、Istio `ServiceEntry` 统一的同一个思路。

**收益**：
- 新协议后端只需实现"config → snapshot 编译器"，自动继承现有 LB/健康检查/metrics
- AI 的 cost-aware/quota-aware 选址，天然可用于 Dubbo（按 CPU quota）、K8s（按 pod 资源）
- 一套 benchmark/test 覆盖所有 cluster kind

---

## 现状分析（origin/develop as of 2026-06-03）

### Cluster Runtime（pkg/cluster/cluster.go）

```go
type Cluster struct {
    endpoints atomic.Pointer[EndpointSnapshot]  // Line 43
    // ...
}
```

- PR #932 已合并：`EndpointSnapshot` immutable，通过 `atomic.Pointer` 发布
- 成员/健康变化时 CAS 重建（`RefreshEndpointsFrom:95`, `UpdateEndpointHealth:109`）
- 预计算 `all` / `healthy` / `byID` / `byAddress` 索引 → 读路径 ~0 alloc

### AI Failure State（pkg/filter/llm/proxy/filter.go）

```go
var sharedCooldownStore = newCooldownStore()  // Line 138
```

- 进程级全局 map：`map[cooldownKey]cooldownEntry`，`sync.Mutex` 保护
- 容量上限 1024，达到 load factor 后触发 sweep
- Filter fallback 循环里检查 cooldown，picker **看不到**

### 问题

AI 失败状态变化频率（per-request）与 health 变化频率（health-check interval）相差 3 个数量级。如果把 
cooldown 塞进 immutable snapshot，每次失败都触发 O(N) 重建 + map 分配 → 毁掉 #932 的零分配读路径。

---

## 核心设计：两层分离 + 编译时抽象

### 设计张力的解法

**问题重述**：snapshot 不可变（#932 收益），但 AI 状态高频变（per-request）。

**解法**：分两层，不同变化频率用不同策略
1. **Immutable membership snapshot**（慢变）— 成员 + 健康 + 索引，仍用 #932 的 CAS 重建
2. **Stable per-endpoint `EndpointState`**（快变）— 按 endpoint ID 持有、跨 snapshot 
重建存活、字段用 atomic 原地更新

Snapshot 重建时，ID 不变的 endpoint **复用同一个 `EndpointState` 指针** —— 新 snapshot 继承老 
state，零重建。

```go
// Phase 1 sketch — not final API
type EndpointState struct {
    // Atomic fields — per-request update, no allocation
    cooldownUntilNano atomic.Int64
    lastFailureClass  atomic.Int32  // FailureClass enum
    inflight          atomic.Int64
    
    // Phase 2 additions (后续)
    // costPerToken      atomic.Uint64  // fixed-point
    // quotaRemaining    atomic.Int64
    // ewmaLatencyNs     atomic.Int64
}

type Cluster struct {
    endpoints atomic.Pointer[EndpointSnapshot]  // immutable view
    states    sync.Map  // map[endpointID]*EndpointState, stable
}
```

Picker 读 snapshot 拿 healthy 列表（零分配），遍历时查 `states[ep.ID]` 拿 cooldown（一次 
atomic.Load）。

**这是本 RFC 最需要评审的设计**——它保持 #932 不变量、解决快变状态问题，但引入了"snapshot + state 
双查"的复杂度。备选方案见后文。

### 失败分类（Failure Classifier）

报告识别的核心洞察：**不是所有失败都该 cooldown endpoint**。

| FailureClass            | 触发条件                 | 反应                           
   | 示例场景                     |
|-------------------------|-------------------------|-----------------------------------|------------------------------|
| `RateLimited`           | HTTP 429                | cooldown，fallback 其他      
     | OpenAI QPM 限流              |
| `ServerError`           | 5xx                     | 短 cooldown，fallback       
      | Provider 临时故障            |
| `Timeout`               | 连接/响应超时           | cooldown，fallback               
 | 网络抖动                     |
| `ContextLengthExceeded` | 请求超模型 context 上限 | **不 cooldown**，路由到大 context | 
长对话超 gpt-3.5 16k         |
| `QuotaExhausted`        | 预算耗尽                | 长 cooldown 至窗口重置            | 
日配额用完                   |
| `ContentPolicyViolation`| 内容审核拒绝            | **不 cooldown**，返回用户         | 
触发 safety filter           |

`ContextLengthExceeded` 和 `ContentPolicyViolation` 是**请求属性**，不是 endpoint 故障 —— 
这正是二元健康表达不了、也是 #696 提的 key-level 问题的泛化。

**谁来分类？** Filter 看得到 HTTP 语义（status / body / timeout）→ filter 分类 → 发事件给 cluster 
→ cluster 更新 `EndpointState`。

```go
// Filter→Cluster 契约（Phase 1）
func (c *Cluster) RecordEndpointFailure(endpointID string, class FailureClass, 
now time.Time)
func (c *Cluster) RecordEndpointSuccess(endpointID string)
```

Cooldown duration policy 在 cluster 侧（从 `LLMMeta` 编译），不硬编码在 filter。

### Multi-Kind 抽象（ClusterKind 接口）

报告提出的终极形态：把 cluster 从"静态 upstream 容器"变成**可编程运行时**。

```go
// Phase 3 sketch — ClusterKind 接口
type ClusterKind interface {
    // 从用户 config 编译出 immutable runtime snapshot
    CompileSnapshot(config *ClusterConfig, previous *EndpointSnapshot) 
*EndpointSnapshot
    
    // Kind-specific 健康检查（可选，默认用 HTTP probe）
    HealthCheck(endpoint *Endpoint) (healthy bool, err error)
    
    // 获取 endpoint metadata（用于 semantic routing / capability match）
    GetMetadata(endpoint *Endpoint) map[string]string
}
```

**内置 kind 实现**（逐步添加）：
- `StaticKind` — 现有行为，从 `ClusterConfig.Endpoints` 构造 snapshot
- `AIProviderKind` — 从 `LLMMeta.Providers` 编译，注入 model/cost/quota metadata
- `DubboServiceKind` — 连接 Dubbo registry，动态发现 + 版本路由
- `KubernetesServiceKind` — watch K8s Endpoints API，Pod readiness → health
- `MCPServiceKind` — MCP server discovery，tool schema → capability metadata

**Picker 统一性**：所有 kind 的 endpoint 都经过 `ClusterKind.CompileSnapshot` 标准化为同一 
`EndpointSnapshot` 结构 + 统一 `EndpointState`。Picker 不关心 kind —— RoundRobin / 
WeightedRandom / CostAware scorer 全部跨 kind 复用。

**配置示例**（Phase 3，供参考）：
```yaml
clusters:
  - name: ai-cluster
    kind: ai_provider  # 新字段
    llm_meta:
      providers: [...]
    
  - name: dubbo-user-service
    kind: dubbo_service
    dubbo_registry: "nacos://..."
    
  - name: k8s-backend
    kind: kubernetes_service
    namespace: default
    service: my-svc
```

---

## 三层演进路径（Phased Roadmap）

本 RFC 提出**完整愿景**，但执行按**三层渐进**，每层可独立验收、向后兼容。

### Phase 1: State 基底（解决 #696 痛点）

**目标**：AI 失败状态进 cluster，picker 能跳过 cooldown endpoint。

**交付**：
- `EndpointState` 结构 + `Cluster.states sync.Map`
- `FailureClass` 枚举（5 种）+ Filter→Cluster 事件契约
- Picker cooldown-skip 逻辑（候选过滤阶段）
- 移除 `sharedCooldownStore`（#939 先做注入，Phase 1 移除）

**依赖**：#958 (config/runtime decoupling) 的 ownership 模型，或明确假设其目标态。

**验收标准**：
- 多 key 隔离测试通过（同 endpoint 不同 credential，一个 429 不影响另一个）
- `ContextLengthExceeded` 不触发 cooldown，能路由到大 context endpoint
- `-race` 干净（concurrent pick + failure record + snapshot refresh）
- Benchmark：pick 仍 0-alloc，record failure O(1) 不触发 snapshot 重建

**预估工作量**：~3 PR，~800 行新增（含测试）。

**Phase 1 不包含**：scorer、semantic router、cost/quota 字段、多 kind。

### Phase 2: AI Runtime 填满（Cost/Quota-Aware 选址）

**目标**：把 AI 特有的 cost / quota / model-capability 内核化，实现报告里的智能选址。

**交付**：
- `EndpointState` 新增字段：`costPerToken` / `quotaRemaining` / `modelCapabilities` 
/ EWMA latency/TTFT/TPOT
- **Scorer 框架**：多目标加权打分（cost×权重 + latency×权重 + quota剩余）
- **Semantic Router**（可选）：embedding-based 匹配，可能作为独立 selector stage 或 filter
- **Quota 窗口管理**：daily/hourly quota tracking，窗口重置时自动解除 cooldown

**AI Endpoint Fields**（报告第 4 节）：
```go
type EndpointState struct {
    // Phase 1 already has
    cooldownUntilNano atomic.Int64
    lastFailureClass  atomic.Int32
    inflight          atomic.Int64
    
    // Phase 2 additions
    costPerInputToken  atomic.Uint64  // fixed-point, e.g. 0.03 = 30000
    costPerOutputToken atomic.Uint64
    quotaRemaining     atomic.Int64   // -1 = unlimited
    quotaWindowResetAt atomic.Int64   // unix nano
    
    ewmaLatencyNs      atomic.Int64
    ewmaTTFTNs         atomic.Int64   // Time To First Token
    ewmaTPOTNs         atomic.Int64   // Time Per Output Token
    
    // metadata（immutable，实际可能放 Endpoint 而非 State）
    modelName          string
    contextWindow      int
    supportsStreaming  bool
    supportsTools      bool
}
```

**Scorer 示例**（可插拔）：
```go
score = (1.0 - costNormalized) * costWeight
      + (1.0 - latencyNormalized) * latencyWeight
      + quotaRemainingRatio * quotaWeight
      + capabilityMatch * capabilityWeight
```

**依赖**：Phase 1 的 `EndpointState` 基础设施。

**验收标准**：
- Cost-aware picker 选最便宜 available endpoint
- Quota 耗尽的 endpoint 被跳过，窗口重置后自动恢复
- Semantic router 能把"翻译任务"路由到 translation-optimized 模型
- Benchmark：scorer 开销 < 100ns/endpoint

**预估工作量**：~5 PR，~1200 行新增（scorer 框架 + quota 管理 + 测试）。

### Phase 3: Multi-Kind 统一（架构级演进）

**目标**：Dubbo / K8s / MCP / 静态 upstream 共用一套 snapshot + picker 基础设施。

**交付**：
- `ClusterKind` 接口定义（上文已给出）
- 重构现有 cluster 为 `StaticKind` 实现
- 新增 `DubboServiceKind` / `KubernetesServiceKind` / `MCPServiceKind`（逐个添加）
- ClusterConfig 新增 `kind` 字段，向后兼容（未指定时默认 `static`）

**技术细节**：
- **发现 → Snapshot 编译**：每个 kind 实现自己的 watch/poll 逻辑，调用 `kind.CompileSnapshot` 
产出标准 `EndpointSnapshot`
- **Metadata 注入**：Dubbo 的 version/group、K8s 的 pod labels、MCP 的 tool 
schema，全部编译到 `Endpoint.Metadata map[string]string`
- **健康检查复用**：默认用现有 HTTP probe，kind 可 override（如 K8s 直接读 Pod readiness）
- **Picker 零感知**：RoundRobin / WeightedRandom / CostAware 等 balancer 
不改一行代码，自动支持所有 kind

**Dubbo 示例**（报告附录 A）：
```go
type DubboServiceKind struct {
    registryClient dubbo.Registry
}

func (k *DubboServiceKind) CompileSnapshot(config *ClusterConfig, prev 
*EndpointSnapshot) *EndpointSnapshot {
    instances := k.registryClient.Discover(config.DubboService, 
config.DubboVersion)
    endpoints := make([]*Endpoint, len(instances))
    for i, inst := range instances {
        endpoints[i] = &Endpoint{
            ID:      inst.ID,
            Address: inst.Host + ":" + inst.Port,
            Metadata: map[string]string{
                "dubbo.version": inst.Version,
                "dubbo.group":   inst.Group,
            },
        }
    }
    return newEndpointSnapshot(config, prev, endpoints, 
inheritRuntimeHealth=true)
}
```

**依赖**：Phase 1 + Phase 2 的 state/scorer 基础。

**验收标准**：
- 同一 route 规则混用 static cluster + dubbo cluster，透明 fallback
- K8s pod scale-up 自动发现，新 pod 进入 healthy 视图
- MCP tool discovery，按 schema capability 匹配路由
- 性能无回退：多 kind 开销 < 5% vs Phase 2

**预估工作量**：~8 PR，~2000 行新增（接口 + 3 个 kind 实现 + 集成测试）。

**Phase 3 是架构演进**，可能引发争议。建议等 Phase 1 落地、Phase 2 证明收益后再提。

---

## 技术决策点与备选方案

### A. 状态存储位置

**方案 1（推荐）**：独立 `EndpointState`，cluster 持有 `sync.Map`  
✅ 核心结构干净，AI 状态独立演进  
✅ 解决快变状态 vs 零分配读路径的张力  
⚠️ 引入"snapshot + state 双查"复杂度  

**方案 2**：字段直接挂 `Endpoint`，随 snapshot 重建  
✅ 实现最简单，无双查  
❌ 每次失败触发 O(N) 重建 + map 分配，毁掉 #932 收益  
❌ LLM 字段污染 cluster core  

**方案 3**：状态留 filter，只规范接口（#939 延伸）  
✅ 改动最小  
❌ Picker 仍看不到失败状态，无法做选址级 cooldown-skip  
❌ 不支持 Phase 2/3 演进  

**选择**：方案 1。双查复杂度可接受（一次 `sync.Map.Load` + 一次 atomic），换来的是架构清晰 + 演进性。

### B. Filter↔Cluster 事件契约

**方案 1（推荐）**：Typed event with FailureClass enum  
```go
func (c *Cluster) RecordEndpointFailure(endpointID string, class FailureClass, 
now time.Time)
```
✅ 类型安全，易扩展新 class  
✅ Cooldown duration policy 在 cluster 侧，可配置  

**方案 2**：Generic callback with error  
```go
func (c *Cluster) RecordEndpointFailure(endpointID string, err error)
```
❌ Cluster 需 parse error 做分类，耦合 HTTP 语义  
❌ 难以表达"不 cooldown"语义（如 `ContextLengthExceeded`）  

**选择**：方案 1。分类在 filter（看得到 HTTP）、决策在 cluster（拥有状态），职责清晰。

### C. Picker 如何看到 State？

**方案 1**：Pre-filter stage，在 balancer 前过滤  
✅ 现有 balancer（RR/WR）零改动  
✅ 职责清晰：filter = 候选过滤，balancer = 从候选中选一个  
⚠️ 多一层抽象  

**方案 2**：在 balancer 内部检查 state  
✅ 少一层抽象  
❌ 每个 balancer 都得改（RoundRobin / WeightedRandom / LeastRequest...）  
❌ cooldown-skip 逻辑重复 N 份  

**选择**：方案 1。新增 `CandidateSelector` stage（或复用现有 selector if exists），输出 filtered 
candidates → balancer pick。

### D. Multi-Kind 何时引入？

**方案 1（保守）**：Phase 1/2 先落地，Phase 3 等需求明确再提  
✅ 降低初期争议，先证明 AI state 收益  
⚠️ 社区可能质疑"为 AI 改 core 值不值"  

**方案 2（激进）**：Phase 1 就引入 `ClusterKind` 接口，现有逻辑重构为 `StaticKind`  
✅ 一次性说清"为什么要动 core" —— 不只为 AI，是为统一多协议  
⚠️ 初期改动大，review 周期长  

**本 RFC 立场**：完整呈现三层愿景（方案 2 的透明度），但**执行按方案 1 分阶段** —— Phase 1 不引入 
`ClusterKind`，等 Phase 2 落地后再评估是否做 Phase 3。给社区完整 context，但不强求一次批准全部。

---

## 兼容性

**向后兼容（Phase 1）**：
- `LLMMeta` 继续解析，成为 cooldown duration policy 的编译输入
- `PickEndpoint` / `PickNextEndpoint` 签名不变
- Filter 外部行为不变（仍是 fallback loop，只是 cooldown 判断下沉了）
- `LLMUnhealthyKey` / `HealthyCheckTimeKey` 继续 deprecated，无新 writer

**配置演进（Phase 2/3）**：
- Phase 2：`LLMMeta` 新增 `cost_per_token` / `quota` 等字段（可选，未设置时行为同 Phase 1）
- Phase 3：`ClusterConfig` 新增 `kind` 字段，默认 `static`（现有配置无需改）

**移除 `sharedCooldownStore` 的时序**：
- #939 先做注入化（全局变量 → 可注入依赖）
- Phase 1 实现 cluster-side state 后，删除 store（filter 直接调 `cluster.RecordFailure`）

---

## 测试与验证

### Phase 1 测试计划

**功能测试**：
- 多 key 隔离：同 endpoint 不同 credential，一个 429 不影响另一个（#696 回归测试）
- 失败分类路由：`ContextLengthExceeded` 不 cooldown，路由到大 context endpoint
- State 生命周期：cooldown 设置 → snapshot refresh 保持 endpoint ID → state 存活
- State GC：endpoint 从 snapshot 移除 → 对应 state 被清理

**并发测试**：
- Race detector clean：concurrent pick + `RecordFailure` + snapshot refresh
- Stress test：1k endpoints、10k req/s，验证 `sync.Map` 争用不成瓶颈

**性能 Benchmark**（回归基线 vs #932）：
- `BenchmarkPick_NoCooldown` — 健康路径，应保持 0-alloc
- `BenchmarkPick_WithCooldownSkip` — 50% endpoint in cooldown，验证过滤开销 < 50ns
- `BenchmarkRecordFailure` — O(1)，0-alloc（只 atomic store）
- 关键断言：**recording failure 不触发 snapshot rebuild**

### Phase 2 测试计划

**功能测试**：
- Cost-aware picker：3 个 endpoint（$0.01 / $0.03 / $0.05 per 1k token），验证选最便宜 
available
- Quota 窗口管理：daily quota 耗尽 → endpoint 被跳过 → 窗口重置（模拟时间跳变）→ 自动恢复
- Semantic router：embedding 相似度匹配（需 mock embedding 服务或本地模型）

**性能 Benchmark**：
- Scorer 开销：100 endpoints，cost+latency+quota 三目标加权，< 100ns/endpoint
- EWMA 更新：per-request latency report，atomic 开销 < 10ns

### Phase 3 测试计划

**集成测试**（每个新 kind）：
- Dubbo：mock registry，service up/down → snapshot rebuild，version 路由
- K8s：fake clientset，pod scale → endpoints 变化，readiness → health
- MCP：mock server discovery，tool schema → capability metadata

**混合 cluster 测试**：
- 同一 route：primary = dubbo cluster，fallback = static cluster，验证透明 fallback
- Picker 复用：RR/WR balancer 零改动，自动支持所有 kind

**性能回归**：
- Multi-kind dispatch 开销 < 5% vs Phase 2（虚函数调用 + interface boxing）

---

## 可观测性

### Metrics（对应 Issue #944）

**Phase 1 新增指标**（注册到现有 OTel/Prometheus 路径）：
- `cluster_snapshot_publish_total{cluster}` — snapshot 重建次数
- `cluster_snapshot_publish_duration_seconds{cluster}` — 重建耗时
- `cluster_endpoint_count{cluster,state=healthy|unhealthy|cooldown}` — 各状态 
endpoint 数
- `cluster_endpoint_failure_total{cluster,endpoint,class}` — 按失败类型计数
- `cluster_endpoint_cooldown_duration_seconds{cluster,endpoint}` — cooldown 
持续时长分布

**Phase 2 新增**：
- `cluster_endpoint_cost_per_request{cluster,endpoint}` — 实际 cost（从 usage API 
拉取）
- `cluster_endpoint_quota_remaining{cluster,endpoint}` — 剩余配额
- `cluster_picker_score{cluster,endpoint}` — 实时打分值（可选，high cardinality）

**Phase 3 新增**：
- `cluster_kind_compile_duration_seconds{cluster,kind}` — 各 kind 的 compile 耗时
- `cluster_discovery_endpoints_total{cluster,kind}` — 从注册中心发现的 endpoint 数

### Tracing

在 `PickEndpoint` / `RecordFailure` / `CompileSnapshot` 关键路径埋 span，关联 request 
trace。

### Debug Dump（对应 Issue #950）

```bash
curl localhost:8888/debug/cluster/my-cluster | jq .
```

输出：
- 当前 snapshot（all/healthy endpoint 列表 + metadata）
- 每个 endpoint 的 `EndpointState`（cooldown/inflight/cost/quota）
- 最近 100 条失败事件（时间戳 + failure class + endpoint ID）
- Picker 配置（balancer type / scorer weights）

Phase 3 增加：
- `kind` 信息 + kind-specific debug（如 Dubbo 的 registry 连接状态）

---

## 依赖与时序

**前置依赖**：
- **#958 (config/runtime decoupling)** — `EndpointState` 的生命周期依赖 runtime 
cluster ownership 模型。Phase 1 必须等 #958 step 1-3 完成，或明确假设其目标态。
- **#939 (cooldown 注入)** — 把 `sharedCooldownStore` 从全局变量变成可注入依赖。Phase 1 的自然前置步骤。

**并行可做**：
- **#944 (snapshot publish metrics)** — 与 Phase 1 并行或作为 Phase 1 一部分，提供观测基础。
- **#943 (LRU eviction)** — 现有 cooldownStore 的优化，Phase 1 完成后可移除该 store。

**后续 follow-up**：
- Phase 2 可能触发新 issue：quota API 集成（从 OpenAI/Azure 拉取实时配额）、semantic router 的 
embedding 服务对接。
- Phase 3 需要独立 RFC：`ClusterKind` 接口设计 + Dubbo/K8s registry 集成方案。

**时间线估算**（纯技术，不含 review 周期）：
- Phase 1：3 周（设计 1w + 实现 1.5w + 测试/benchmark 0.5w）
- Phase 2：5 周（scorer 框架 2w + quota 管理 2w + 测试 1w）
- Phase 3：8 周（接口设计 1w + StaticKind 重构 2w + 3 个新 kind 各 1.5w + 集成测试 1w）

---

## 风险与缓解

| 风险 | 影响 | 概率 | 缓解措施 |
|------|------|------|----------|
| #932 零分配读路径回退 | 性能下降 | 中 | Benchmark gate：任何回退 > 5% 的 PR 不合并 |
| `sync.Map` 争用成瓶颈 | 高并发性能差 | 低 | Stress test 验证；必要时改 sharded map |
| Phase 3 scope 爆炸 | 开发周期失控 | 高 | 只实现 3 个 kind（Dubbo/K8s/MCP），其他 kind 按需添加 |
| 社区质疑"过度设计" | RFC 被拒 | 中 | Phase 1 先落地证明收益，Phase 3 可选（Non-goal 转 Future Work） |
| Filter↔Cluster 契约演进困难 | 破坏性变更 | 低 | Phase 1 定好接口，Phase 2 只加字段不改签名 |

---

## 开放问题（待社区讨论）

1. **`EndpointState` 生命周期 GC 策略**  
   当 endpoint 从 snapshot 移除时，对应 state 何时清理？  
   - 选项 A：snapshot rebuild 时立即删除不在新 snapshot 的 state  
   - 选项 B：延迟 GC（如 5 分钟后），防止频繁 up/down 导致 state 丢失  
   选 A 简单但可能丢失短期统计（EWMA），选 B 复杂但对抖动友好。

2. **Cooldown duration policy 放哪？**  
   - 选项 A：`ClusterConfig` 新增 `cooldown_policy` 字段  
   - 选项 B：复用 `LLMMeta.retry` 语义，从 retry 推导 cooldown  
   - 选项 C：独立 `RuntimePolicy` 对象（#958 的一部分）  
   倾向 C，但需与 #958 对齐。

3. **Semantic Router 放哪？**  
   - 选项 A：独立 filter（在 LLM proxy filter 之前）  
   - 选项 B：cluster 的 `CandidateSelector` stage  
   - 选项 C：picker 内部逻辑  
   Embedding 计算成本高，倾向 A（filter 可 cache），但 B 架构更统一。

4. **Phase 3 的 `ClusterKind` 是否需要生命周期钩子？**  
   如 `OnStart` / `OnStop`（启动 registry watch / 清理连接）。可能需要，但细节待 Phase 3 RFC。

5. **多 cluster kind 的健康检查如何统一？**  
   K8s 已有 readiness probe，Dubbo 有心跳协议，是否强制统一为 HTTP probe？还是允许 kind 
override？倾向后者，但需定义 override 接口。

---

## 参考资料

- PR #932: Endpoint runtime view 演进（immutable snapshot 基础）
- Issue #905: Cluster 模块问题梳理
- Issue #939: Inject cooldownStore（移除全局变量）
- Issue #943: LRU eviction for cooldownStore
- Issue #944: Snapshot publish metrics
- Issue #958: Config/runtime decoupling（5 步骤规划）
- Discussion #696: AI gateway milestones（key-level cooldown 问题源头）
- 《Pixiu 集群模块深度研究报告》— 本 RFC 的设计来源

---

## 附录：报告关键设计的对应关系

| 报告章节 | RFC Phase | 状态 |
|----------|-----------|------|
| 2. Failure Classifier | Phase 1 | 完整覆盖 |
| 3. Immutable Snapshot + Mutable State 分离 | Phase 1 | 完整覆盖 |
| 4. AI Endpoint Fields (cost/quota/EWMA) | Phase 2 | 完整覆盖 |
| 5. Scorer 框架 | Phase 2 | 完整覆盖 |
| 6. ClusterKind 接口 | Phase 3 | 完整覆盖 |
| 7. Dubbo/K8s/MCP Kind 实现 | Phase 3 | 示例给出，细节待 Phase 3 RFC |
| 附录 A: 性能优化建议 | 贯穿三层 | Benchmark gate + metrics 覆盖 |

报告提的所有核心设计点，本 RFC 都纳入了演进路径。















GitHub link: 
https://github.com/apache/dubbo-go-pixiu/discussions/696#discussioncomment-17164117

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [D] [AI] AI gateway milestones [dubbo-go-pixiu]

Reply via email to