Alanxtl commented on PR #3347:
URL: https://github.com/apache/dubbo-go/pull/3347#issuecomment-4585666050

   > > 
可以参考uber的实现https://www.infoq.com/news/2024/02/uber-dynamic-load-shedding/?utm_source=email&utm_medium=editorial&utm_campaign=SpecialNL&utm_content=02292024&forceSponsorshipId=58a6b10a-7b64-4cfd-a08d-c065e2458967
   > 
   > @Alanxtl hello,也麻烦看下,pr 
description里有说到,我测rtt_shrink这个case的时候发现,当延迟升高的时候(20ms升高到100ms,再升高到500ms),limiter对inflight的限制似乎并没有明显的减少,而是维持在原来的水平,不清楚这个是否符合原来的预期?测试用的代码放在[presee_test/adaptive_service/rtt_shrink](https://github.com/apache/dubbo-go/pull/3347/files#diff-70ddaa2cb5e0278f805d5b804c8788009e281cd10f819e51180021b96c36c161)下。
   
   
   这更像是暴露了当前 HillClimbing 实现的“不敏感/参数问题”,不太应该当成完全符合预期。
   
   代码原因大概在这里:
   
   - 
[`hill_climbing.go`](https://github.com/apache/dubbo-go/blob/b6f035620578f7605fbb49c06f3b36c083e2d5c2/filter/adaptivesvc/limiter/hill_climbing.go#L188-L245):limiter
 每个 update round 才基于 `transactionNum/rttAvg` 计算 `maxCapacity`、`tps`,不是 RTT 
一升高就立即降并发。
   - 
[`hill_climbing.go`](https://github.com/apache/dubbo-go/blob/b6f035620578f7605fbb49c06f3b36c083e2d5c2/filter/adaptivesvc/limiter/hill_climbing.go#L260-L307):收缩条件要求
 `bestMaxCapacity - maxCapacity` 和 RTT 劣化同时满足硬编码阈值。
   - 
[`hill_climbing.go`](https://github.com/apache/dubbo-go/blob/b6f035620578f7605fbb49c06f3b36c083e2d5c2/filter/adaptivesvc/limiter/hill_climbing.go#L313-L343):真正
 shrink 时也不是按 RTT 比例降低,而是回到 `bestLimitation - log(limitation)` 附近,所以下降幅度可能很小。
   - 
[`rtt_shrink/server/main.go`](https://github.com/apache/dubbo-go/blob/b6f035620578f7605fbb49c06f3b36c083e2d5c2/presee_test/adaptive_service/rtt_shrink/server/main.go#L78-L85):压测确实是通过服务端
 `Sleep(currentStage.delay)` 人为拉高 handler RTT。
   - 
[`rtt_shrink/server/main.go`](https://github.com/apache/dubbo-go/blob/b6f035620578f7605fbb49c06f3b36c083e2d5c2/presee_test/adaptive_service/rtt_shrink/server/main.go#L141-L154):观测的
 `limiter_limitation` 是直接从 provider 侧 limiter snapshot 暴露出来的,不是客户端自己估出来的。
   
   这个可能是已知限制。预期上 adaptive concurrency 应该在 RTT 明显恶化、吞吐不再提升时收缩;但当前算法受历史 best 
metrics、硬编码阈值、update interval 和 shrink 幅度影响,在 `20ms -> 500ms` 的阶梯压测下没有明显降下来。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to