>  Maybe we can use `pulsar_subscription_backlog_quota_percentage`?

It makes sense, Thanks

>  Add _subscription here because this metric is only for subscriptions.
>  Aggregated by topic/namespace level is meaningless. We should also
mention
>  the new metrics only be exposed if the subscription level metrics is
>  enabled.

Although the default value of `preciseTimeBasedBacklogQuotaCheck` is false,
but it also has a chance to lead to I/O operations, exposing the metric in
subscription level may cost too much. WDYT?


> For the metrics of the backlog eviction event, I think we could only add a
> counter
> indicator? From the backlog size, we are not able to know if the backlog
> eviction
> is happened or not because draining the backlog will also cleanup the
> backlog.

Do you mean we also need a metric to indicate backlog eviction happens? It
makes sense

> I also didn't understand the "slowest_subscription" label, could you
please
> provide more
> context about this label? How it will be used.

I've updated the PIP to explain what does it mean, PTAL

> BTW, from my understanding, the reason of why are users not convenient
with
> the solution of
> write queries based on the backlog size is different topics can have
> different backlog quota
> policy, but it not able integrate with the metrics services. They need to
> add many queries to
> trigger the alert.

Yes, TopicPolicies and NamespacePolicies also have the backlog eviction
configuration, and we just expose the
`pulsar_subscription_backlog_quota_percentage` metric, so that users don't
need to query TopicPolicies/NamespacePolicies.


Thank,
Tao Jiuming

PengHui Li <peng...@apache.org> 于2023年2月28日周二 12:30写道:

> Maybe we can use `pulsar_subscription_backlog_quota_percentage`?
> Add _subscription here because this metric is only for subscriptions.
> Aggregated by topic/namespace level is meaningless. We should also mention
> the new metrics only be exposed if the subscription level metrics is
> enabled.
>
> And we should not use _eviction here because it was configured by users.
> It can be different policies [0].
>
> For the metrics of the backlog eviction event, I think we could only add a
> counter
> indicator? From the backlog size, we are not able to know if the backlog
> eviction
> is happened or not because draining the backlog will also cleanup the
> backlog.
>
> I also didn't understand the "slowest_subscription" label, could you please
> provide more
> context about this label? How it will be used.
>
> BTW, from my understanding, the reason of why are users not convenient with
> the solution of
> write queries based on the backlog size is different topics can have
> different backlog quota
> policy, but it not able integrate with the metrics services. They need to
> add many queries to
> trigger the alert.
>
> [0]
>
> https://pulsar.apache.org/docs/2.11.x/admin-api-namespaces/#set-backlog-quota-policies
>
>
> Thanks,
> Penghui
>
> On Sun, Feb 26, 2023 at 11:03 PM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Pulsar has 2 configurations for the backlog eviction:
> > > backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond, if
> > > topic backlog reaches the threshold of any item, backlog eviction will
> be
> > > triggered.
> >
> > This seems like default values, not the actual values. Can you please
> > provide an explanation in the PIP and link to read more:
> > 1. Where do you define the backlog quota exactly? What is the granularity
> > (subscription?)
> > 2.  Is the backlog quota on by default? If so, what are the default
> values?
> >
> >
> >
> > *Notes*
> > 1. When the backlog quota limit is defined in Bytes, and you wish to know
> > how close a subscription is to its bytes limit, you need to calculate the
> > backlog size in bytes. From my understanding, there is an accurate
> > calculation (which is costly in terms of I/O) and there is an estimate of
> > it. I presume you would want to use the estimated one, is that correct?
> > The backlog quota itself, uses the accurate or the estimated when it
> starts
> > evicting entries (i.e. marking them as acknowledged)?
> >
> > 2. For the backlog limit specifying in time units, there is no estimate,
> as
> > it must be calculated all the time (earliest unacknowledged message
> > distance from now). How do you plan to calculate the current age of the
> > earliest message without bearing that I/O cost on each metric
> calculation?
> >
> > 3. In the Goal section, you specify that your goal is to add a
> "proximity"
> > metric.
> > a) You must define that - what is proximity metric exactly? What are its
> > units? How are you planning to calculate it?
> > b) Proximity is not a good term IMO. I personally have never seen this
> term
> > used in software systems, unless it's in the aviation/space industry.
> Once
> > you explain (a) I hope I can help provide alternative names.
> >
> > 4. Maybe we should provide the used quota percentage for both limits,
> > instead of one per both, since it's easier to act upon the alert when you
> > need which one triggered it.
> >
> > 5. I didn't understand the "slowest_subscription" label used when
> > describing the metric label. Can you please provide an explanation?
> >
> > 6. I suggest writing a "High Level Design" section, and add everything
> you
> > need to know for this proposal, so I don't need to read the
> > implementation details below (code).
> >
> > Thanks,
> >
> > Asaf
> >
> >
> > On Wed, Feb 22, 2023 at 4:52 PM 太上玄元道君 <dao...@apache.org> wrote:
> >
> > > Hi all,
> > >
> > > I've started a PIP to discuss: PIP-248 Add backlog eviction metric
> > >
> > > ### Motivation:
> > >
> > > Pulsar has 2 configurations for the backlog eviction:
> > > `backlogQuotaDefaultLimitBytes` and `backlogQuotaDefaultLimitSecond`,
> if
> > > topic backlog reaches the threshold of any item, backlog eviction will
> be
> > > triggered.
> > >
> > > Before backlog eviction happens, we don't have a metric to monitor how
> > long
> > > that it can reaches the threshold.
> > >
> > > We can provide a progress bar metric to tell users some topics is about
> > to
> > > trigger backlog eviction. And users can subscribe the alert to schedule
> > > consumers.
> > >
> > > For more details, please read the PIP at
> > > https://github.com/apache/pulsar/issues/19601
> > >
> > > Thanks,
> > > Tao Jiuming
> > >
> >
>

Reply via email to