Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Asaf Mesika Wed, 10 May 2023 01:01:04 -0700

On Tue, May 9, 2023 at 11:29 PM Dave Fisher <w...@apache.org> wrote:


>
>
> > On May 8, 2023, at 2:49 AM, Asaf Mesika <asaf.mes...@gmail.com> wrote:
> >
> > Your feedback made me realized I need to add "TL;DR" section, which I
> just
> > added.
> >
> > I'm quoting it here. It gives a brief summary of the proposal, which
> > requires up to 5 min of read time, helping you get a high level picture
> > before you dive into the background/motivation/solution.
> >
> > ----------------------
> > TL;DR
> >
> > Working with Metrics today as a user or a developer is hard and has many
> > severe issues.
> >
> > From the user perspective:
> >
> >   - One of Pulsar strongest feature is "cheap" topics so you can easily
> >   have 10k - 100k topics per broker. Once you do that, you quickly learn
> that
> >   the amount of metrics you export via "/metrics" (Prometheus style
> endpoint)
> >   becomes really big. The cost to store them becomes too high, queries
> >   time-out or even "/metrics" endpoint it self times out.
> >   The only option Pulsar gives you today is all-or-nothing filtering and
> >   very crude aggregation. You switch metrics from topic aggregation
> level to
> >   namespace aggregation level. Also you can turn off producer and
> consumer
> >   level metrics. You end up doing it all leaving you "blind", looking at
> the
> >   metrics from a namespace level which is too high level. You end up
> >   conjuring all kinds of scripts on top of topic stats endpoint to glue
> some
> >   aggregated metrics view for the topics you need.
> >   - Summaries (metric type giving you quantiles like p95) which are used
> >   in Pulsar, can't be aggregated across topics / brokers due its inherent
> >   design.
> >   - Plugin authors spend too much time on defining and exposing metrics
> to
> >   Pulsar since the only interface Pulsar offers is writing your metrics
> by
> >   your self as UTF-8 bytes in Prometheus Text Format to byte stream
> interface
> >   given to you.
> >   - Pulsar histograms are exported in a way that is not conformant with
> >   Prometheus, which means you can't get the p95 quantile on such
> histograms,
> >   making them very hard to use in day to day life.
>
> What version of DataSketches is used to produce the histogram? Is is still
> an old Yahoo one, or are we using an updated one from Apache DataSketches?
>
> Seems like this is a single PR/small PIP for 3.1?


Histograms are a list of buckets, each is a counter.
Summary is a collection of values collected over a time window, which at
the end you get a calculation of the quantiles of those values: p95, p50,
and those are exported from Pulsar.

Pulsar histogram do not use Data Sketches. They are just counters.
They are not adhere to Prometheus since:
a. The counter is expected to be cumulative, but Pulsar resets each bucket
counter to 0 every 1 min
b. The bucket upper range is expected to be written as an attribute "le"
but today it is encoded in the name of the metric itself.

This is a breaking change, hence hard to mark in any small release.
This is why it's part of this PIP since so many things will break, and all
of them will break on a separate layer (OTel metrics), hence not break
anyone without their consent.



>
>
> >   - Too many metrics are rates which also delta reset every interval you
> >   configure in Pulsar and restart, instead of relying on cumulative (ever
> >   growing) counters and let Prometheus use its rate function.
> >   - and many more issues
> >
> > From the developer perspective:
> >
> >   - There are 4 different ways to define and record metrics in Pulsar:
> >   Pulsar own metrics library, Prometheus Java Client, Bookkeeper metrics
> >   library and plain native Java SDK objects (AtomicLong, ...). It's very
> >   confusing for the developer and create inconsistencies for the end user
> >   (e.g. Summary for example is different in each).
> >   - Patching your metrics into "/metrics" Prometheus endpoint is
> >   confusing, cumbersome and error prone.
> >   - many more
> >
> > This proposal offers several key changes to solve that:
> >
> >   - Cardinality (supporting 10k-100k topics per broker) is solved by
> >   introducing a new aggregation level for metrics called Topic Metric
> Group.
> >   Using configuration, you specify for each topic its group (using
> >   wildcard/regex). This allows you to "zoom" out to a more detailed
> >   granularity level like groups instead of namespaces, which you control
> how
> >   many groups you'll have hence solving the cardinality issue, without
> >   sacrificing level of detail too much.
> >   - Fine-grained filtering mechanism, dynamic. You'll have rule-based
> >   dynamic configuration, allowing you to specify per
> namespace/topic/group
> >   which metrics you'd like to keep/drop. Rules allows you to set the
> default
> >   to have small amount of metrics in group and namespace level only and
> drop
> >   the rest. When needed, you can add an override rule to "open" up a
> certain
> >   group to have more metrics in higher granularity (topic or even
> >   consumer/producer level). Since it's dynamic you "open" such a group
> when
> >   you see it's misbehaving, see it in topic level, and when all
> resolved, you
> >   can "close" it. A bit similar experience to logging levels in Log4j or
> >   Logback, that you default and override per class/package.
> >
> > Aggregation and Filtering combined solves the cardinality without
> > sacrificing the level of detail when needed and most importantly, you
> > determine which topic/group/namespace it happens on.
> >
> > Since this change is so invasive, it requires a single metrics library to
> > implement all of it on top of; Hence the third big change point is
> > consolidating all four ways to define and record metrics to a single
> one, a
> > new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for the
> > Pulsar Function runners).
> > Introducing OpenTelemetry (OTel) solves also the biggest pain point from
> > the developer perspective, since it's a superb metrics library offering
> > everything you need, and there is going to be a single way - only it.
> Also,
> > it solves the robustness for Plugin author which will use OpenTelemetry.
> It
> > so happens that it also solves all the numerous problems described in the
> > doc itself.
> >
> > The solution will be introduced as another layer with feature toggles, so
> > you can work with existing system, and/or OTel, until gradually
> deprecating
> > existing system.
> >
> > It's a big breaking change for Pulsar users on many fronts: names,
> > semantics, configuration. Read at the end of this doc to learn exactly
> what
> > will change for the user (in high level).
> >
> > In my opinion, it will make Pulsar user experience so much better, they
> > will want to migrate to it, despite the breaking change.
> >
> > This was a very short summary. You are most welcomed to read the full
> > design document below and express feedback, so we can make it better.
> >
> > On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> >
> >>
> >>
> >> On Sun, May 7, 2023 at 4:23 PM Yunze Xu <y...@streamnative.io.invalid>
> >> wrote:
> >>
> >>> I'm excited to learn much more about metrics when I started reading
> >>> this proposal. But I became more and more frustrated when I found
> >>> there is still too much content left even if I've already spent much
> >>> time reading this proposal. I'm wondering how much time did you expect
> >>> reviewers to read through this proposal? I just recalled the
> >>> discussion you started before [1]. Did you expect each PMC member that
> >>> gives his/her +1 to read only parts of this proposal?
> >>>
> >>
> >> I estimated around 2 hours needed for a reviewer.
> >> I hate it being so long, but I simply couldn't find a way to downsize it
> >> more. Furthermore, I consulted with my colleagues including Matteo, but
> we
> >> couldn't see a way to scope it down.
> >> Why? Because once you begin this journey, you need to know how it's
> going
> >> to end.
> >> What I ended up doing, is writing all the crucial details for review in
> >> the High Level Design section.
> >> It's still a big, hefty section, but I don't think I can step out or let
> >> anyone else change Pulsar so invasively without the full extent of the
> >> change.
> >>
> >> I don't think it's wise to read parts.
> >> I did my very best effort to minimize it, but the scope is simply big.
> >> Open for suggestions, but it requires reading all the PIP :)
> >>
> >> Thanks a lot Yunze for dedicating any time to it.
> >>
> >>
> >>
> >>
> >>>
> >>> Let's talk back to the proposal, for now, what I mainly learned and
> >>> are concerned about mostly are:
> >>> 1. Pulsar has many ways to expose metrics. It's not unified and
> confusing.
> >>> 2. The current metrics system cannot support a large amount of topics.
> >>> 3. It's hard for plugin authors to integrate metrics. (For example,
> >>> KoP [2] integrates metrics by implementing the
> >>> PrometheusRawMetricsProvider interface and it indeed needs much work)
> >>>
> >>> Regarding the 1st issue, this proposal chooses OpenTelemetry (OTel).
> >>>
> >>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
> >>> section. It's still frustrating to see no answer. Eventually, I found
> >>>
> >>
> >> OpenTelemetry isn't the solution for large amount of topic.
> >> The solution is described at
> >> "Aggregate and Filtering to solve cardinality issues" section.
> >>
> >>
> >>
> >>> the explanation in the "What we need to fix in OpenTelemetry -
> >>> Performance" section. It seems that we still need some enhancements in
> >>> OTel. In other words, currently OTel is not ready for resolving all
> >>> these issues listed in the proposal but we believe it will.
> >>>
> >>
> >> Let me rephrase "believe" --> we work together with the maintainers to
> do
> >> it, yes.
> >> I am open for any other suggestion.
> >>
> >>
> >>
> >>>
> >>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
> >>> section, the plugin authors still need to implement the new OTel
> >>> interfaces. Is it much easier than using the existing ways to expose
> >>> metrics? Could metrics still be easily integrated with Grafana?
> >>>
> >>
> >> Yes, it's way easier.
> >> Basically you have a full fledged metrics library objects: Meter, Gauge,
> >> Histogram, Counter.
> >> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus format.
> >> You get namespacing for free with Meter name and version.
> >> It's way better than current solution and any other library.
> >>
> >>
> >>>
> >>> That's all I am concerned about at the moment. I understand, and
> >>> appreciate that you've spent much time studying and explaining all
> >>> these things. But, this proposal is still too huge.
> >>>
> >>
> >> I appreciate your effort a lot!
> >>
> >>
> >>
> >>>
> >>> [1] https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
> >>> [2]
> >>>
> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
> >>>
> >>> Thanks,
> >>> Yunze
> >>>
> >>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com>
> wrote:
> >>>>
> >>>> I'm very appreciative for feedback from multiple pulsar users and devs
> >>> on
> >>>> this PIP, since it has dramatic changes suggested and quite extensive
> >>>> positive change for the users.
> >>>>
> >>>>
> >>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <asaf.mes...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'm very excited to release a PIP I've been working on in the past 11
> >>>>> months, which I think will be immensely valuable to Pulsar, which I
> >>> like so
> >>>>> much.
> >>>>>
> >>>>> PIP: https://github.com/apache/pulsar/issues/20197
> >>>>>
> >>>>> I'm quoting here the preface:
> >>>>>
> >>>>> === QUOTE START ===
> >>>>>
> >>>>> Roughly 11 months ago, I started working on solving the biggest issue
> >>> with
> >>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker with a
> >>> large
> >>>>> topic count: 10k, 100k, and future support of 1M. This started by
> >>> mapping
> >>>>> the existing functionality and then enumerating all the problems I
> >>> saw (all
> >>>>> documented in this doc
> >>>>> <
> >>>
> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
> >>>>
> >>>>> ).
> >>>>>
> >>>>> This PIP is a parent PIP. It aims to gradually solve (using sub-PIPs)
> >>> all
> >>>>> the current metric system's problems and provide the ability to
> >>> monitor a
> >>>>> broker with a large topic count, which is currently lacking. As a
> >>> parent
> >>>>> PIP, it will describe each problem and its solution at a high level,
> >>>>> leaving fine-grained details to the sub-PIPs. The parent PIP ensures
> >>> all
> >>>>> solutions align and does not contradict each other.
> >>>>>
> >>>>> The basic building block to solve the monitoring ability of large
> >>> topic
> >>>>> count is aggregating internally (to topic groups) and adding
> >>> fine-grained
> >>>>> filtering. We could have shoe-horned it into the existing metric
> >>> system,
> >>>>> but we thought adding that to a system already ingrained with many
> >>> problems
> >>>>> would be wrong and hard to do gradually, as so many things will
> >>> break. This
> >>>>> is why the second-biggest design decision presented here is
> >>> consolidating
> >>>>> all existing metric libraries into a single one - OpenTelemetry
> >>>>> <https://opentelemetry.io/>. The parent PIP will explain why
> >>>>> OpenTelemetry was chosen out of existing solutions and why it far
> >>> exceeds
> >>>>> all other options. I’ve been working closely with the OpenTelemetry
> >>>>> community in the past eight months: brain-storming this integration,
> >>> and
> >>>>> raising issues, in an effort to remove serious blockers to make this
> >>>>> migration successful.
> >>>>>
> >>>>> I made every effort to summarize this document so that it can be
> >>> concise
> >>>>> yet clear. I understand it is an effort to read it and, more so,
> >>> provide
> >>>>> meaningful feedback on such a large document; hence I’m very grateful
> >>> for
> >>>>> each individual who does so.
> >>>>>
> >>>>> I think this design will help improve the user experience immensely,
> >>> so it
> >>>>> is worth the time spent reading it.
> >>>>>
> >>>>>
> >>>>> === QUOTE END ===
> >>>>>
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Asaf Mesika
> >>>>>
> >>>
> >>
>
>

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Reply via email to