Maybe we can use `pulsar_subscription_backlog_quota_percentage`?
Add _subscription here because this metric is only for subscriptions.
Aggregated by topic/namespace level is meaningless. We should also mention
the new metrics only be exposed if the subscription level metrics is
enabled.

And we should not use _eviction here because it was configured by users.
It can be different policies [0].

For the metrics of the backlog eviction event, I think we could only add a
counter
indicator? From the backlog size, we are not able to know if the backlog
eviction
is happened or not because draining the backlog will also cleanup the
backlog.

I also didn't understand the "slowest_subscription" label, could you please
provide more
context about this label? How it will be used.

BTW, from my understanding, the reason of why are users not convenient with
the solution of
write queries based on the backlog size is different topics can have
different backlog quota
policy, but it not able integrate with the metrics services. They need to
add many queries to
trigger the alert.

[0]
https://pulsar.apache.org/docs/2.11.x/admin-api-namespaces/#set-backlog-quota-policies


Thanks,
Penghui

On Sun, Feb 26, 2023 at 11:03 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:

> Hi,
>
> Pulsar has 2 configurations for the backlog eviction:
> > backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond, if
> > topic backlog reaches the threshold of any item, backlog eviction will be
> > triggered.
>
> This seems like default values, not the actual values. Can you please
> provide an explanation in the PIP and link to read more:
> 1. Where do you define the backlog quota exactly? What is the granularity
> (subscription?)
> 2.  Is the backlog quota on by default? If so, what are the default values?
>
>
>
> *Notes*
> 1. When the backlog quota limit is defined in Bytes, and you wish to know
> how close a subscription is to its bytes limit, you need to calculate the
> backlog size in bytes. From my understanding, there is an accurate
> calculation (which is costly in terms of I/O) and there is an estimate of
> it. I presume you would want to use the estimated one, is that correct?
> The backlog quota itself, uses the accurate or the estimated when it starts
> evicting entries (i.e. marking them as acknowledged)?
>
> 2. For the backlog limit specifying in time units, there is no estimate, as
> it must be calculated all the time (earliest unacknowledged message
> distance from now). How do you plan to calculate the current age of the
> earliest message without bearing that I/O cost on each metric calculation?
>
> 3. In the Goal section, you specify that your goal is to add a "proximity"
> metric.
> a) You must define that - what is proximity metric exactly? What are its
> units? How are you planning to calculate it?
> b) Proximity is not a good term IMO. I personally have never seen this term
> used in software systems, unless it's in the aviation/space industry. Once
> you explain (a) I hope I can help provide alternative names.
>
> 4. Maybe we should provide the used quota percentage for both limits,
> instead of one per both, since it's easier to act upon the alert when you
> need which one triggered it.
>
> 5. I didn't understand the "slowest_subscription" label used when
> describing the metric label. Can you please provide an explanation?
>
> 6. I suggest writing a "High Level Design" section, and add everything you
> need to know for this proposal, so I don't need to read the
> implementation details below (code).
>
> Thanks,
>
> Asaf
>
>
> On Wed, Feb 22, 2023 at 4:52 PM 太上玄元道君 <dao...@apache.org> wrote:
>
> > Hi all,
> >
> > I've started a PIP to discuss: PIP-248 Add backlog eviction metric
> >
> > ### Motivation:
> >
> > Pulsar has 2 configurations for the backlog eviction:
> > `backlogQuotaDefaultLimitBytes` and `backlogQuotaDefaultLimitSecond`, if
> > topic backlog reaches the threshold of any item, backlog eviction will be
> > triggered.
> >
> > Before backlog eviction happens, we don't have a metric to monitor how
> long
> > that it can reaches the threshold.
> >
> > We can provide a progress bar metric to tell users some topics is about
> to
> > trigger backlog eviction. And users can subscribe the alert to schedule
> > consumers.
> >
> > For more details, please read the PIP at
> > https://github.com/apache/pulsar/issues/19601
> >
> > Thanks,
> > Tao Jiuming
> >
>

Reply via email to