lhotari commented on code in PR #23945:
URL: https://github.com/apache/pulsar/pull/23945#discussion_r1975535331


##########
pip/pip-406.md:
##########
@@ -0,0 +1,152 @@
+# PIP-406: Introduce metrics related to dispatch_throttled_count
+
+# Background knowledge
+
+## Motivation
+
+Currently, users can monitor subscription backlogs using the 
`pulsar_subscription_back_log_no_delayed` metric. 
+However, if [dispatch 
throttling](https://pulsar.apache.org/docs/next/concepts-throttling/) is 
configured at the broker/topic/subscription level,
+this metric may not accurately reflect whether the backlog is due to 
insufficient consumer capacity, as it could be caused by dispatch throttling.
+
+## Goals
+
+Introduce metrics to indicate the count of `messages/bytes throttled` for 
**broker/topic/subscription** level rate limit. 
+This allows users to write PromQL queries to identify subscriptions with high 
backlogs but low or no throttling, pinpointing backlogs caused by insufficient 
consumer capacity.
+
+## In Scope
+
+Broker Level:
+- Introduce the metric `pulsar_broker_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a broker.
+- Introduce the metric `pulsar_broker_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a broker.
+
+Topic Level:
+- Introduce the metric `pulsar_dispatch_throttled_msg_count` to represent the 
total count of messages throttled for a topic.
+- Introduce the metric `pulsar_dispatch_throttled_bytes_count` to represent 
the total count of bytes throttled for a topic.
+ 
+Subscription Level:
+- Introduce the metric `pulsar_subscription_dispatch_throttled_msg_count` to 
represent the total count of messages throttled for a subscription.
+- Introduce the metric `pulsar_subscription_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a subscription.
+ 
+
+## Out of Scope
+- These states are not persistent and will reset upon broker restart/ topic 
re-load / subscription reconnected.
+
+# High Level Design
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`. Increase these values in the `consumeDispatchQuota` 
method when the TokenBucket for messages or bytes is insufficient.
+2. Output these fields when retrieving metrics.
+
+
+# Detailed Design
+
+## Design & Implementation Details
+1. Maintain `dispatchThrottleMsgCount` and `dispatchThrottleBytesCount` in 
`DispatchRateLimiter`:
+```java
+    private final LongAdder dispatchThrottleMsgCount = new LongAdder();
+    private final LongAdder dispatchThrottleBytesCount = new LongAdder();
+```
+
+2. During each 
[consumeDispatchQuota](https://github.com/apache/pulsar/blob/c4cff0ab3dac169c0a1418ef2f63f61604f6278e/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/DispatchRateLimiter.java#L97-L104),
+if token bucket is insufficient, increase these fields accordingly.
+```diff
+     public void consumeDispatchQuota(long numberOfMessages, long byteSize) {
+         AsyncTokenBucket localDispatchRateLimiterOnMessage = 
dispatchRateLimiterOnMessage;
+         if (numberOfMessages > 0 && localDispatchRateLimiterOnMessage != 
null) {
+-            localDispatchRateLimiterOnMessage.consumeTokens(numberOfMessages);
++            if 
(!localDispatchRateLimiterOnMessage.consumeTokensAndCheckIfContainsTokens(numberOfMessages))
 {
++                dispatchThrottleMsgCount.increment();
++            }
+         }
+         AsyncTokenBucket localDispatchRateLimiterOnByte = 
dispatchRateLimiterOnByte;
+         if (byteSize > 0 && localDispatchRateLimiterOnByte != null) {
+-            localDispatchRateLimiterOnByte.consumeTokens(byteSize);
++            if 
(!localDispatchRateLimiterOnByte.consumeTokensAndCheckIfContainsTokens(byteSize))
 {
++                dispatchThrottleBytesCount.increment();
++            }

Review Comment:
   @shibd You found a great solution! I like it. The only remaining issue is 
about naming in the PIP document.
   
   For example, where it says "total count of bytes throttled," it's currently 
misleading:
   > Introduce the metric `pulsar_broker_dispatch_throttled_bytes_count` to 
represent the total count of bytes throttled for a broker.
   
   Instead of "total count of bytes throttled," it's actually "total number of 
times bytes were throttled." 
   
   This is Claude revisited PIP-409 document: 
https://gist.github.com/lhotari/9e971fb7ac89d73a97d4de8724bc4359 . 
   I'm not exactly sure how accurate the changes are. I asked it to address the 
"total count of bytes throttled" issue. 
   
   I brought up one more detail in the mailing list message:
   
   > For better observability, I'd suggest including a property in the
   > metric that indicates the throttling reason (broker, topic, or
   > subscription level rate limiter). This would eliminate the need for
   > separate metrics while still providing all the necessary information
   > in a single, clean metric.
   
   After thinking about it again, the presented metrics themselves in the PIP 
document are fine. There's a  problem is the implementation:
   
   You loose information when the subscription gets throttled due to the rate 
limiting at broker or topic level. The reason why this is the problem is that 
when you go to look at the subscription metrics, you wouldn't find out that it 
was throttled if it got throttled at broker or topic level. Do you agree that 
this is a problem?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to