Copilot commented on code in PR #3229:
URL: https://github.com/apache/brpc/pull/3229#discussion_r2869145376


##########
docs/en/backup_request.md:
##########
@@ -39,6 +39,81 @@ my_func_latency << tm.u_elapsed();  // u represents for 
microsecond, and s_elaps
 // All work is done here. My_func_qps, my_func_latency, my_func_latency_cdf 
and many other counters would be shown in /vars.
 ```
 
+## Rate-limited backup requests
+
+To limit the ratio of backup requests sent, use the built-in factory function 
or implement the `BackupRequestPolicy` interface yourself.
+
+Priority order: `backup_request_policy` > `backup_request_ms`.
+

Review Comment:
   The PR description lists an additional configuration path 
(ChannelOptions.backup_request_max_ratio and related gflags / precedence), but 
this doc section only describes the existing policy-vs-ms priority and manual 
policy wiring. If the max-ratio option is still part of the intended change, 
the documentation should be updated to include it; otherwise the PR description 
should be corrected to match what is actually implemented.



##########
docs/cn/backup_request.md:
##########
@@ -39,6 +39,80 @@ my_func_latency << tm.u_elapsed();  // u代表微秒,还有s_elapsed(), 
m_elap
 // 好了,在/vars中会显示my_func_qps, my_func_latency, my_func_latency_cdf等很多计数器。
 ```
 
+## Backup Request 限流
+
+如需限制 backup request 的发送比例,可使用内置工厂函数创建限流策略,也可自行实现 `BackupRequestPolicy` 接口。
+
+优先级顺序:`backup_request_policy` > `backup_request_ms`。
+

Review Comment:
   The PR description mentions a per-channel backup_request_max_ratio option 
and precedence rules beyond just backup_request_policy vs backup_request_ms, 
but this document section only describes the older priority. Please either 
update this doc to cover the new option (if intended) or adjust the PR 
description to match the implemented feature set.



##########
src/brpc/controller.cpp:
##########
@@ -351,8 +351,14 @@ void Controller::set_backup_request_ms(int64_t timeout_ms) 
{
 }
 
 int64_t Controller::backup_request_ms() const {
-    int timeout_ms = NULL != _backup_request_policy ?
-        _backup_request_policy->GetBackupRequestMs(this) : _backup_request_ms;
+    int timeout_ms = _backup_request_ms;
+    if (NULL != _backup_request_policy) {
+        const int32_t policy_ms = 
_backup_request_policy->GetBackupRequestMs(this);
+        // -1 means the policy defers to the channel-level backup_request_ms.
+        if (policy_ms >= 0) {

Review Comment:
   Controller::backup_request_ms() treats any negative GetBackupRequestMs() 
value as “defer to _backup_request_ms” (because only policy_ms >= 0 overrides). 
The comment says “-1 means defer”, but -2/-100 etc will also defer, which can 
change semantics for existing custom policies that used negative values to 
disable backups. Consider explicitly handling only -1 as the defer sentinel 
(and otherwise honoring the policy return value), or documenting/validating the 
full negative-value contract.
   ```suggestion
           if (policy_ms != -1) {
   ```



##########
src/brpc/backup_request_policy.cpp:
##########
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "brpc/backup_request_policy.h"
+
+#include "butil/logging.h"
+#include "bvar/reducer.h"
+#include "bvar/window.h"
+#include "butil/atomicops.h"
+#include "butil/time.h"
+
+namespace brpc {
+
+// Standalone statistics module for tracking backup/total request ratio
+// within a sliding time window. Each instance schedules two bvar::Window
+// sampler tasks; keep this in mind for high channel-count deployments.
+class BackupRateLimiter {
+public:
+    BackupRateLimiter(double max_backup_ratio,
+                      int window_size_seconds,
+                      int update_interval_seconds)
+        : _max_backup_ratio(max_backup_ratio)
+        , _update_interval_us(update_interval_seconds * 1000000LL)
+        , _total_count()
+        , _backup_count()
+        , _total_window(&_total_count, window_size_seconds)
+        , _backup_window(&_backup_count, window_size_seconds)
+        , _cached_ratio(0.0)
+        , _last_update_us(0) {
+    }
+
+    // All atomic operations use relaxed ordering intentionally.
+    // This is best-effort rate limiting: a slightly stale ratio is
+    // acceptable for approximate throttling.
+    bool ShouldAllow() const {
+        const int64_t now_us = butil::cpuwide_time_us();
+        int64_t last_us = _last_update_us.load(butil::memory_order_relaxed);
+        double ratio = _cached_ratio.load(butil::memory_order_relaxed);
+
+        if (now_us - last_us >= _update_interval_us) {
+            if (_last_update_us.compare_exchange_strong(
+                    last_us, now_us, butil::memory_order_relaxed)) {
+                int64_t total = _total_window.get_value();
+                int64_t backup = _backup_window.get_value();
+                // Fall back to cumulative counts when the window has no
+                // sampled data yet (cold-start within the first few seconds).
+                if (total <= 0) {
+                    total = _total_count.get_value();
+                    backup = _backup_count.get_value();
+                }
+                if (total > 0) {
+                    ratio = static_cast<double>(backup) / total;
+                } else if (backup > 0) {
+                    // Backups issued but no completions in window yet 
(latency spike).
+                    // Be conservative to prevent backup storms.
+                    ratio = 1.0;
+                } else {
+                    // True cold-start: no traffic yet. Allow freely.
+                    ratio = 0.0;
+                }
+                _cached_ratio.store(ratio, butil::memory_order_relaxed);
+            }
+        }
+
+        bool allow = ratio < _max_backup_ratio;
+        if (allow) {
+            // Count backup decisions immediately for faster feedback
+            // during latency spikes (before RPCs complete).
+            _backup_count << 1;
+        }
+        return allow;
+    }
+
+    void OnRPCEnd(const Controller* /*controller*/) {
+        // Count all completed RPC legs (both original and backup RPCs).
+        // Backup decisions are counted in ShouldAllow() at decision time for
+        // faster feedback. As a result, the effective suppression threshold is
+        // (backup_count / total_legs), where total_legs includes both original
+        // and backup completions.
+        _total_count << 1;
+    }
+
+private:
+    double  _max_backup_ratio;
+    int64_t _update_interval_us;
+
+    bvar::Adder<int64_t>            _total_count;
+    mutable bvar::Adder<int64_t>    _backup_count;
+    bvar::Window<bvar::Adder<int64_t>> _total_window;
+    bvar::Window<bvar::Adder<int64_t>> _backup_window;
+
+    mutable butil::atomic<double>  _cached_ratio;
+    mutable butil::atomic<int64_t> _last_update_us;
+};
+
+// Internal BackupRequestPolicy that composes a BackupRateLimiter
+// for ratio-based suppression.
+class RateLimitedBackupPolicy : public BackupRequestPolicy {
+public:
+    RateLimitedBackupPolicy(int32_t backup_request_ms,
+                            double max_backup_ratio,
+                            int window_size_seconds,
+                            int update_interval_seconds)
+        : _backup_request_ms(backup_request_ms)
+        , _rate_limiter(max_backup_ratio, window_size_seconds,
+                        update_interval_seconds) {
+    }
+
+    int32_t GetBackupRequestMs(const Controller* /*controller*/) const 
override {
+        return _backup_request_ms;
+    }
+
+    bool DoBackup(const Controller* /*controller*/) const override {
+        return _rate_limiter.ShouldAllow();
+    }
+
+    void OnRPCEnd(const Controller* controller) override {
+        _rate_limiter.OnRPCEnd(controller);
+    }
+
+private:
+    int32_t _backup_request_ms;
+    BackupRateLimiter _rate_limiter;
+};
+
+BackupRequestPolicy* CreateRateLimitedBackupPolicy(
+    const RateLimitedBackupPolicyOptions& options) {
+    if (options.backup_request_ms < -1) {
+        LOG(ERROR) << "Invalid backup_request_ms=" << options.backup_request_ms
+                   << ", must be >= -1 (-1 means inherit from ChannelOptions)";
+        return NULL;
+    }
+    if (options.max_backup_ratio <= 0 || options.max_backup_ratio > 1.0) {
+        LOG(ERROR) << "Invalid max_backup_ratio=" << options.max_backup_ratio
+                   << ", must be in (0, 1]";
+        return NULL;
+    }

Review Comment:
   PR description says max_ratio > 1.0 should be clamped to 1.0 with a warning, 
but CreateRateLimitedBackupPolicy currently treats >1.0 as invalid and returns 
NULL. Either update the implementation to match the documented behavior (clamp) 
or adjust the PR description/tests/docs to reflect the stricter validation.



##########
src/brpc/backup_request_policy.cpp:
##########
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "brpc/backup_request_policy.h"
+
+#include "butil/logging.h"
+#include "bvar/reducer.h"
+#include "bvar/window.h"
+#include "butil/atomicops.h"
+#include "butil/time.h"
+
+namespace brpc {
+
+// Standalone statistics module for tracking backup/total request ratio
+// within a sliding time window. Each instance schedules two bvar::Window
+// sampler tasks; keep this in mind for high channel-count deployments.
+class BackupRateLimiter {
+public:
+    BackupRateLimiter(double max_backup_ratio,
+                      int window_size_seconds,
+                      int update_interval_seconds)
+        : _max_backup_ratio(max_backup_ratio)
+        , _update_interval_us(update_interval_seconds * 1000000LL)
+        , _total_count()
+        , _backup_count()
+        , _total_window(&_total_count, window_size_seconds)
+        , _backup_window(&_backup_count, window_size_seconds)
+        , _cached_ratio(0.0)
+        , _last_update_us(0) {
+    }
+
+    // All atomic operations use relaxed ordering intentionally.
+    // This is best-effort rate limiting: a slightly stale ratio is
+    // acceptable for approximate throttling.
+    bool ShouldAllow() const {
+        const int64_t now_us = butil::cpuwide_time_us();
+        int64_t last_us = _last_update_us.load(butil::memory_order_relaxed);
+        double ratio = _cached_ratio.load(butil::memory_order_relaxed);
+
+        if (now_us - last_us >= _update_interval_us) {
+            if (_last_update_us.compare_exchange_strong(
+                    last_us, now_us, butil::memory_order_relaxed)) {
+                int64_t total = _total_window.get_value();
+                int64_t backup = _backup_window.get_value();
+                // Fall back to cumulative counts when the window has no
+                // sampled data yet (cold-start within the first few seconds).
+                if (total <= 0) {
+                    total = _total_count.get_value();
+                    backup = _backup_count.get_value();
+                }
+                if (total > 0) {
+                    ratio = static_cast<double>(backup) / total;
+                } else if (backup > 0) {
+                    // Backups issued but no completions in window yet 
(latency spike).
+                    // Be conservative to prevent backup storms.
+                    ratio = 1.0;
+                } else {
+                    // True cold-start: no traffic yet. Allow freely.
+                    ratio = 0.0;
+                }
+                _cached_ratio.store(ratio, butil::memory_order_relaxed);
+            }
+        }
+
+        bool allow = ratio < _max_backup_ratio;
+        if (allow) {
+            // Count backup decisions immediately for faster feedback
+            // during latency spikes (before RPCs complete).
+            _backup_count << 1;
+        }
+        return allow;

Review Comment:
   BackupRateLimiter::ShouldAllow() can allow an unbounded burst of backups 
within update_interval_seconds because the decision uses a cached ratio that is 
only recomputed periodically and is not updated after incrementing 
_backup_count. This can cause the backup/total ratio to exceed 
_max_backup_ratio by a large margin under high QPS, weakening the stated goal 
of preventing backup storms. Consider recomputing (or adjusting) the ratio 
using current counters on each decision, or otherwise ensuring the cached value 
is updated when _backup_count changes (even if approximately).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to