Re: [Discuss] PIP-248: Add backlog eviction metric

Asaf Mesika Sun, 05 Mar 2023 05:01:14 -0800

On Thu, Mar 2, 2023 at 12:57 PM 太上玄元道君 <[email protected]> wrote:


> > I  think you should fix this explanation:
>
> Thanks! I would like to copy the context you provide to the PIP motivation,
> your description is more detailed, so developers don't have to go through
> the code.
>

Sure


>
> > Today the quota is checked periodically, right? So that's how the
> operator
> > knows the cost in terms of I/O is limited.
> > Now you are adding one additional I/O per collection, every 1 min by
> > default. That's a lot perhaps. How long is the check interval today?
>
> Actually, I don't want to introduce additional costs, I thought we
> could cache its result, so that it won't introduce additional costs.
> It may be that I did not make it clear in the PIP and caused this
> misunderstanding, sorry.
>

Ok, just to verify: You plan to modify the code that runs periodically the
backlog quota check, so the result will be cached there? This way when you
pull that information from that code every 1min to expose it as a metric it
will have 0 I/O cost?



>
> > The user today can calculate quota used for size based limit, since there
> > are two metrics that are exposed today on a topic level: "
> > pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size".
> You
> > can just divide the two to get a percentage.
> > For the time-based limit, the only metric exposed today is quota itself ,
> "
> > pulsar_storage_backlog_quota_limit_time".
>
> I only noticed `pulsar_storage_backlog_size` but missed
> `pulsar_storage_backlog_quota_limit` and
> `pulsar_storage_backlog_quota_limit_time`. Many thanks for your reminder.
>
>
> So, in this condition, we already have the following topic-level metrics:
> `pulsar_storage_backlog_size`: The total backlog size of the topics of this
> topic owned by this broker (in bytes).
> `pulsar_storage_backlog_quota_limit`: The total amount of the data in this
> topic that limits the backlog quota (bytes).
> `pulsar_storage_backlog_quota_limit_time`: The backlog quota limit in
> time(seconds). (This metric does not exists in the doc, need to improve)
>
>
> We just need to add a new metric named
> `pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level
> that indicates the publish time of the earliest message in the backlog.
> So users could get `pulsar_backlog_size_quota_used_percentage` by divide
> `pulsar_storage_backlog_size ` and
> `pulsar_storage_backlog_quota_limit`(`pulsar_storage_backlog_size` /
> `pulsar_storage_backlog_quota_limit`),
> and could get `pulsar_backlog_time_quota_used_percentage` by divide `now -
> pulsar_storage_earliest_msg_publish_time_in_backlog` and
> `pulsar_storage_backlog_quota_limit_time` (`now -
> pulsar_storage_earliest_msg_publish_time_in_backlog` /
> `pulsar_storage_backlog_quota_limit_time`).
>

I think there is a problem with the name
`pulsar_storage_earliest_msg_publish_time_in_backlog` in the topic-level:
* First, I prefer exposing the age rather than the publish time.
* Second, it's a bit hard to figure out the meaning of the earliest msg in
the backlog.

Maybe `pulsar_storage_backlog_age_seconds`? In the explanation you can
write: "The age (time passed since it was published) of the earliest
unacknowledged message based on the topic's
existing subscriptions" ?



>
> The backlog quota time checker runs periodically, so we can cache its
> result, so it won't lead to much costs.
>
> Pulsar also exposed subscription-level  `backlogSize` and
> `earliestMsgPublishTimeInBacklog` in Pulsar-Admin
> <
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> >
> if
> `subscriptionBacklogSize` and `getEarliestTimeInBacklog` are true.
> We can also expose `backlogQuotaLimiteSize` and `backlogQuotaLimitTime` of
> the topic to PulsarAdmin.
>

What is the relationship you see between Pulsar exposing
subscriptionBacklogSize and earliestMsgPublishTimeInBacklog in
subscription, to exposing the backlog quota limits in pulsar admin?

Limits can be exposed to Pulsar Admin, since it has 0 cost associated with
it.
I think it's a good idea to do that.
The quota usage can also be exposed to pulsar admin, since we pull that
data from the backlog quota checker cache, so it has 0 cost as well.

As we said in previous email we can also expose
`backlogQuotaTimeOldestBacklogAgeSubscriptionName`


>
> After users receive the backlog alert from metrics alerting systems, they
> can get the topic name, then, they can request Topics#getStats
> <
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java#L1139
> >
> to
> get which subscriptions are in the huge backlog.
>
>
I agree users can use PulsarAdmin getStats for topic , with
getEarliestTimeInBacklog=true to find the oldest subscription responsible
for exceeding quota, but we can give them that information with 0 cost
since we already have that subscription name cached (we spent the I/O to
find out who that subscription is, let's just cache it and provide it).




> Thanks,
> Tao Jiuming
>
> Asaf Mesika <[email protected]> 于2023年3月1日周三 23:42写道：
>
> > >
> > > Pulsar has 2 configurations for the backlog eviction
> > > <
> >
> https://pulsar.apache.org/docs/2.11.x/cookbooks-retention-expiry/#backlog-quotas
> > >
> > > : backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond.
> > > By default, backlog eviction is disabled, and also, there is a field
> > named
> > > backlogQuotaMap in TopicPolicies
> > > <
> >
> https://github.com/apache/pulsar/blob/master/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/HierarchyTopicPolicies.java#L45
> > >
> > > /NamespaceSpacePolicies
> > > <
> >
> https://github.com/apache/pulsar/blob/master/pulsar-client-admin-api/src/main/java/org/apache/pulsar/common/policies/data/Policies.java#L41
> >
> > assists
> > > in controlling Topic/Namespace level backlog quota.
> > >
> > > If topic backlog reaches the threshold of any item, backlog eviction
> will
> > > be triggered, Pulsar will move subscription's cursor to skip
> > unacknowledged
> > > messages.
> > >
> > > Before backlog eviction happens, we don't have a metric to monitor how
> > > long that it can reaches the threshold.
> > >
> >
> > I  think you should fix this explanation:
> >
> > In Pulsar, a subscription maintains a state of message acknowledged. A
> > subscription backlog is the set of messages which are unacknowledged.
> > A subscription backlog size is the sum of size of unacknowledged messages
> > (in bytes).
> > A topic can have many subscriptions.
> > A topic backlog is defined as the backlog size of the subscription which
> > has the oldest unacknowledged message. Since acknowledged messages can be
> > interleaved with unacknowledged messages, calculating the exact size of
> > that subscription can be expensive as it requires I/O operations to read
> > from the messages from the ledgers.
> > For that reason, the topic backlog is actually defined to be the
> estimated
> > backlog size of that subscription. It does so by summarizing the size of
> > all the ledgers, starting from the current active one, up to the ledger
> > which contains the oldest unacknowledged message (There is actually a
> > faster way to calculate it, but this is the definition of the
> estimation).
> >
> > A topic backlog age is the age of the oldest unacknowledged message (in
> any
> > subscription). If that message was written 30 minutes ago, its age is 30
> > minutes.
> >
> > Pulsar has a feature called backlog quota (place link). It allows the
> user
> > to define a quota - in effect, a limit - which limits the topic backlog.
> > There are two types of quotas:
> > * Size based: The limit is for the topic backlog size (as we defined
> > above).
> > * Time based: The limit is for the topic's backlog age (as we defined
> > above).
> >
> > Once a topic backlog exceeds either one of those limits, an action is
> taken
> > upon messages written to the topic:
> > * The producer write is placed on hold for a certain amount of time
> before
> > failing.
> > * The producer write is failed
> > * The subscriptions oldest unacknowledged messages will be acknowledged
> in
> > order until both the topic backlog size or age will fall inside the limit
> > (quota). The process is called backlog eviction (happens every interval)
> >
> > The quotas can be defined as a default value for any topic, by using the
> > following broker configuration keys: backlogQuotaDefaultLimitBytes ,
> > backlogQuotaDefaultLimitSecond. It can also be specified directly for all
> > topics in a given namespace using the namespace policy, or a specific
> topic
> > using a topic policy.
> >
> > The user today can calculate quota used for size based limit, since there
> > are two metrics that are exposed today on a topic level: "
> > pulsar_storage_backlog_quota_limit" and "pulsar_storage_backlog_size".
> You
> > can just divide the two to get a percentage.
> > For the time-based limit, the only metric exposed today is quota itself
> , "
> > pulsar_storage_backlog_quota_limit_time".
> >
> > ------------
> >
> > I would create two metrics:
> >
> > `pulsar_backlog_size_quota_used_percentage`
> > `pulsar_backlog_time_quota_used_percentage`
> >
> > You would like to know what triggered the alert, hence two.
> > It's not the quota percentage, it's the quota used percentage.
> >
> > ----------
> >
> > It checks if the backlog size exceeds the threshold(
> > > backlogQuotaDefaultLimitBytes), and it gets the current backlog size by
> > > calculating LedgerInfo
> > > <
> >
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto#L54
> > >,
> > > it will not lead to I/O.
> >
> > This is not correct.
> > It checks against the topic / namespace policy, and if it doesn't exist,
> it
> > falls back on the default configuration key mentioned above.
> >
> > It checks if the backlog time exceeds the threshold(
> > > backlogQuotaDefaultLimitSecond). If preciseTimeBasedBacklogQuotaCheck
> is
> > > set to be true, it will read an entry from Bookkeeper, but the default
> > > value is false, which means it gets the backlog time by calculating
> > > LedgerInfo
> > > <
> >
> https://github.com/apache/pulsar/blob/master/managed-ledger/src/main/proto/MLDataFormats.proto#L54
> > >.
> > > So in general, we don't need to worry about it will lead to I/O.
> >
> >
> > I'm afraid of that.
> > Today the quota is checked periodically, right? So that's how the
> operator
> > knows the cost in terms of I/O is limited.
> >  Now you are adding one additional I/O per collection, every 1 min by
> > default. That's a lot perhaps. How long is the check interval today?
> >
> > Perhaps in the backlog quota check, you can persist the check result, and
> > use it? Persist the age that is.
> >
> >
> > ------
> >
> > Regarding "slowest_subscription"
> > I think the cost is too high, because the subscriptions will keep
> > alternating, which can generate so many unique time series. Since
> > Prometheus flush only every 2 hours, or any there TSDB, it will cost you
> > too much.
> >
> > I suggest exposing the name via the topic stats. This way they can issue
> a
> > REST call to grab that subscription name only when the alert fires.
> >
> > Thanks,
> >
> > Asaf
> >
> >
> >
> >
> >
> > On Tue, Feb 28, 2023 at 9:29 AM 太上玄元道君 <[email protected]> wrote:
> >
> > > Hi Asaf,
> > > I've updated the PIP, PTAL
> > >
> > > Thank,
> > > Tao Jiuming
> > >
> > > Asaf Mesika <[email protected]> 于2023年2月26日周日 23:03写道：
> > >
> > > > Hi,
> > > >
> > > > Pulsar has 2 configurations for the backlog eviction:
> > > > > backlogQuotaDefaultLimitBytes and backlogQuotaDefaultLimitSecond,
> if
> > > > > topic backlog reaches the threshold of any item, backlog eviction
> > will
> > > be
> > > > > triggered.
> > > >
> > > > This seems like default values, not the actual values. Can you please
> > > > provide an explanation in the PIP and link to read more:
> > > > 1. Where do you define the backlog quota exactly? What is the
> > granularity
> > > > (subscription?)
> > > > 2.  Is the backlog quota on by default? If so, what are the default
> > > values?
> > > >
> > > >
> > > >
> > > > *Notes*
> > > > 1. When the backlog quota limit is defined in Bytes, and you wish to
> > know
> > > > how close a subscription is to its bytes limit, you need to calculate
> > the
> > > > backlog size in bytes. From my understanding, there is an accurate
> > > > calculation (which is costly in terms of I/O) and there is an
> estimate
> > of
> > > > it. I presume you would want to use the estimated one, is that
> correct?
> > > > The backlog quota itself, uses the accurate or the estimated when it
> > > starts
> > > > evicting entries (i.e. marking them as acknowledged)?
> > > >
> > > > 2. For the backlog limit specifying in time units, there is no
> > estimate,
> > > as
> > > > it must be calculated all the time (earliest unacknowledged message
> > > > distance from now). How do you plan to calculate the current age of
> the
> > > > earliest message without bearing that I/O cost on each metric
> > > calculation?
> > > >
> > > > 3. In the Goal section, you specify that your goal is to add a
> > > "proximity"
> > > > metric.
> > > > a) You must define that - what is proximity metric exactly? What are
> > its
> > > > units? How are you planning to calculate it?
> > > > b) Proximity is not a good term IMO. I personally have never seen
> this
> > > term
> > > > used in software systems, unless it's in the aviation/space industry.
> > > Once
> > > > you explain (a) I hope I can help provide alternative names.
> > > >
> > > > 4. Maybe we should provide the used quota percentage for both limits,
> > > > instead of one per both, since it's easier to act upon the alert when
> > you
> > > > need which one triggered it.
> > > >
> > > > 5. I didn't understand the "slowest_subscription" label used when
> > > > describing the metric label. Can you please provide an explanation?
> > > >
> > > > 6. I suggest writing a "High Level Design" section, and add
> everything
> > > you
> > > > need to know for this proposal, so I don't need to read the
> > > > implementation details below (code).
> > > >
> > > > Thanks,
> > > >
> > > > Asaf
> > > >
> > > >
> > > > On Wed, Feb 22, 2023 at 4:52 PM 太上玄元道君 <[email protected]> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've started a PIP to discuss: PIP-248 Add backlog eviction metric
> > > > >
> > > > > ### Motivation:
> > > > >
> > > > > Pulsar has 2 configurations for the backlog eviction:
> > > > > `backlogQuotaDefaultLimitBytes` and
> `backlogQuotaDefaultLimitSecond`,
> > > if
> > > > > topic backlog reaches the threshold of any item, backlog eviction
> > will
> > > be
> > > > > triggered.
> > > > >
> > > > > Before backlog eviction happens, we don't have a metric to monitor
> > how
> > > > long
> > > > > that it can reaches the threshold.
> > > > >
> > > > > We can provide a progress bar metric to tell users some topics is
> > about
> > > > to
> > > > > trigger backlog eviction. And users can subscribe the alert to
> > schedule
> > > > > consumers.
> > > > >
> > > > > For more details, please read the PIP at
> > > > > https://github.com/apache/pulsar/issues/19601
> > > > >
> > > > > Thanks,
> > > > > Tao Jiuming
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] PIP-248: Add backlog eviction metric

Reply via email to