> Maybe we can use `pulsar_subscription_backlog_quota_percentage`? It makes sense, Thanks
> Add _subscription here because this metric is only for subscriptions. > Aggregated by topic/namespace level is meaningless. We should also mention > the new metrics only be exposed if the subscription level metrics is > enabled. Although the default value of `preciseTimeBasedBacklogQuotaCheck` is false, but it also has a chance to lead to I/O operations, exposing the metric in subscription level may cost too much. WDYT? > For the metrics of the backlog eviction event, I think we could only add a > counter > indicator? From the backlog size, we are not able to know if the backlog > eviction > is happened or not because draining the backlog will also cleanup the > backlog. Do you mean we also need a metric to indicate backlog eviction happens? It makes sense > I also didn't understand the "slowest_subscription" label, could you please > provide more > context about this label? How it will be used. I've updated the PIP to explain what does it mean, PTAL > BTW, from my understanding, the reason of why are users not convenient with > the solution of > write queries based on the backlog size is different topics can have > different backlog quota > policy, but it not able integrate with the metrics services. They need to > add many queries to > trigger the alert. Yes, TopicPolicies and NamespacePolicies also have the backlog eviction configuration, and we just expose the `pulsar_subscription_backlog_quota_percentage` metric, so that users don't need to query TopicPolicies/NamespacePolicies. Thank, Tao Jiuming PengHui Li <peng...@apache.org> 于2023年2月28日周二 12:30写道: > Maybe we can use `pulsar_subscription_backlog_quota_percentage`? > Add _subscription here because this metric is only for subscriptions. > Aggregated by topic/namespace level is meaningless. We should also mention > the new metrics only be exposed if the subscription level metrics is > enabled. > > And we should not use _eviction here because it was configured by users. > It can be different policies [0]. > > For the metrics of the backlog eviction event, I think we could only add a > counter > indicator? From the backlog size, we are not able to know if the backlog > eviction > is happened or not because draining the backlog will also cleanup the > backlog. > > I also didn't understand the "slowest_subscription" label, could you please > provide more > context about this label? How it will be used. > > BTW, from my understanding, the reason of why are users not convenient with > the solution of > write queries based on the backlog size is different topics can have > different backlog quota > policy, but it not able integrate with the metrics services. They need to > add many queries to > trigger the alert. > > [0] > > https://pulsar.apache.org/docs/2.11.x/admin-api-namespaces/#set-backlog-quota-policies > > > Thanks, > Penghui > > On Sun, Feb 26, 2023 at 11:03 PM Asaf Mesika <asaf.mes...@gmail.com> > wrote: > > > Hi, > > > > Pulsar has 2 configurations for the backlog eviction: > > > backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond, if > > > topic backlog reaches the threshold of any item, backlog eviction will > be > > > triggered. > > > > This seems like default values, not the actual values. Can you please > > provide an explanation in the PIP and link to read more: > > 1. Where do you define the backlog quota exactly? What is the granularity > > (subscription?) > > 2. Is the backlog quota on by default? If so, what are the default > values? > > > > > > > > *Notes* > > 1. When the backlog quota limit is defined in Bytes, and you wish to know > > how close a subscription is to its bytes limit, you need to calculate the > > backlog size in bytes. From my understanding, there is an accurate > > calculation (which is costly in terms of I/O) and there is an estimate of > > it. I presume you would want to use the estimated one, is that correct? > > The backlog quota itself, uses the accurate or the estimated when it > starts > > evicting entries (i.e. marking them as acknowledged)? > > > > 2. For the backlog limit specifying in time units, there is no estimate, > as > > it must be calculated all the time (earliest unacknowledged message > > distance from now). How do you plan to calculate the current age of the > > earliest message without bearing that I/O cost on each metric > calculation? > > > > 3. In the Goal section, you specify that your goal is to add a > "proximity" > > metric. > > a) You must define that - what is proximity metric exactly? What are its > > units? How are you planning to calculate it? > > b) Proximity is not a good term IMO. I personally have never seen this > term > > used in software systems, unless it's in the aviation/space industry. > Once > > you explain (a) I hope I can help provide alternative names. > > > > 4. Maybe we should provide the used quota percentage for both limits, > > instead of one per both, since it's easier to act upon the alert when you > > need which one triggered it. > > > > 5. I didn't understand the "slowest_subscription" label used when > > describing the metric label. Can you please provide an explanation? > > > > 6. I suggest writing a "High Level Design" section, and add everything > you > > need to know for this proposal, so I don't need to read the > > implementation details below (code). > > > > Thanks, > > > > Asaf > > > > > > On Wed, Feb 22, 2023 at 4:52 PM 太上玄元道君 <dao...@apache.org> wrote: > > > > > Hi all, > > > > > > I've started a PIP to discuss: PIP-248 Add backlog eviction metric > > > > > > ### Motivation: > > > > > > Pulsar has 2 configurations for the backlog eviction: > > > `backlogQuotaDefaultLimitBytes` and `backlogQuotaDefaultLimitSecond`, > if > > > topic backlog reaches the threshold of any item, backlog eviction will > be > > > triggered. > > > > > > Before backlog eviction happens, we don't have a metric to monitor how > > long > > > that it can reaches the threshold. > > > > > > We can provide a progress bar metric to tell users some topics is about > > to > > > trigger backlog eviction. And users can subscribe the alert to schedule > > > consumers. > > > > > > For more details, please read the PIP at > > > https://github.com/apache/pulsar/issues/19601 > > > > > > Thanks, > > > Tao Jiuming > > > > > >