Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Asaf Mesika Sun, 14 May 2023 09:30:16 -0700

Bumping, as I have received only 1 partial feedback so far on the PIP.

Any feedback is highly appreciated!


Thanks!

Asaf

On Wed, May 10, 2023 at 11:00 AM Asaf Mesika <asaf.mes...@gmail.com> wrote:

>
>
> On Tue, May 9, 2023 at 11:29 PM Dave Fisher <w...@apache.org> wrote:
>
>>
>>
>> > On May 8, 2023, at 2:49 AM, Asaf Mesika <asaf.mes...@gmail.com> wrote:
>> >
>> > Your feedback made me realized I need to add "TL;DR" section, which I
>> just
>> > added.
>> >
>> > I'm quoting it here. It gives a brief summary of the proposal, which
>> > requires up to 5 min of read time, helping you get a high level picture
>> > before you dive into the background/motivation/solution.
>> >
>> > ----------------------
>> > TL;DR
>> >
>> > Working with Metrics today as a user or a developer is hard and has many
>> > severe issues.
>> >
>> > From the user perspective:
>> >
>> >   - One of Pulsar strongest feature is "cheap" topics so you can easily
>> >   have 10k - 100k topics per broker. Once you do that, you quickly
>> learn that
>> >   the amount of metrics you export via "/metrics" (Prometheus style
>> endpoint)
>> >   becomes really big. The cost to store them becomes too high, queries
>> >   time-out or even "/metrics" endpoint it self times out.
>> >   The only option Pulsar gives you today is all-or-nothing filtering and
>> >   very crude aggregation. You switch metrics from topic aggregation
>> level to
>> >   namespace aggregation level. Also you can turn off producer and
>> consumer
>> >   level metrics. You end up doing it all leaving you "blind", looking
>> at the
>> >   metrics from a namespace level which is too high level. You end up
>> >   conjuring all kinds of scripts on top of topic stats endpoint to glue
>> some
>> >   aggregated metrics view for the topics you need.
>> >   - Summaries (metric type giving you quantiles like p95) which are used
>> >   in Pulsar, can't be aggregated across topics / brokers due its
>> inherent
>> >   design.
>> >   - Plugin authors spend too much time on defining and exposing metrics
>> to
>> >   Pulsar since the only interface Pulsar offers is writing your metrics
>> by
>> >   your self as UTF-8 bytes in Prometheus Text Format to byte stream
>> interface
>> >   given to you.
>> >   - Pulsar histograms are exported in a way that is not conformant with
>> >   Prometheus, which means you can't get the p95 quantile on such
>> histograms,
>> >   making them very hard to use in day to day life.
>>
>> What version of DataSketches is used to produce the histogram? Is is
>> still an old Yahoo one, or are we using an updated one from Apache
>> DataSketches?
>>
>> Seems like this is a single PR/small PIP for 3.1?
>
>
> Histograms are a list of buckets, each is a counter.
> Summary is a collection of values collected over a time window, which at
> the end you get a calculation of the quantiles of those values: p95, p50,
> and those are exported from Pulsar.
>
> Pulsar histogram do not use Data Sketches. They are just counters.
> They are not adhere to Prometheus since:
> a. The counter is expected to be cumulative, but Pulsar resets each bucket
> counter to 0 every 1 min
> b. The bucket upper range is expected to be written as an attribute "le"
> but today it is encoded in the name of the metric itself.
>
> This is a breaking change, hence hard to mark in any small release.
> This is why it's part of this PIP since so many things will break, and all
> of them will break on a separate layer (OTel metrics), hence not break
> anyone without their consent.
>
>
>
>>
>>
>> >   - Too many metrics are rates which also delta reset every interval you
>> >   configure in Pulsar and restart, instead of relying on cumulative
>> (ever
>> >   growing) counters and let Prometheus use its rate function.
>> >   - and many more issues
>> >
>> > From the developer perspective:
>> >
>> >   - There are 4 different ways to define and record metrics in Pulsar:
>> >   Pulsar own metrics library, Prometheus Java Client, Bookkeeper metrics
>> >   library and plain native Java SDK objects (AtomicLong, ...). It's very
>> >   confusing for the developer and create inconsistencies for the end
>> user
>> >   (e.g. Summary for example is different in each).
>> >   - Patching your metrics into "/metrics" Prometheus endpoint is
>> >   confusing, cumbersome and error prone.
>> >   - many more
>> >
>> > This proposal offers several key changes to solve that:
>> >
>> >   - Cardinality (supporting 10k-100k topics per broker) is solved by
>> >   introducing a new aggregation level for metrics called Topic Metric
>> Group.
>> >   Using configuration, you specify for each topic its group (using
>> >   wildcard/regex). This allows you to "zoom" out to a more detailed
>> >   granularity level like groups instead of namespaces, which you
>> control how
>> >   many groups you'll have hence solving the cardinality issue, without
>> >   sacrificing level of detail too much.
>> >   - Fine-grained filtering mechanism, dynamic. You'll have rule-based
>> >   dynamic configuration, allowing you to specify per
>> namespace/topic/group
>> >   which metrics you'd like to keep/drop. Rules allows you to set the
>> default
>> >   to have small amount of metrics in group and namespace level only and
>> drop
>> >   the rest. When needed, you can add an override rule to "open" up a
>> certain
>> >   group to have more metrics in higher granularity (topic or even
>> >   consumer/producer level). Since it's dynamic you "open" such a group
>> when
>> >   you see it's misbehaving, see it in topic level, and when all
>> resolved, you
>> >   can "close" it. A bit similar experience to logging levels in Log4j or
>> >   Logback, that you default and override per class/package.
>> >
>> > Aggregation and Filtering combined solves the cardinality without
>> > sacrificing the level of detail when needed and most importantly, you
>> > determine which topic/group/namespace it happens on.
>> >
>> > Since this change is so invasive, it requires a single metrics library
>> to
>> > implement all of it on top of; Hence the third big change point is
>> > consolidating all four ways to define and record metrics to a single
>> one, a
>> > new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for the
>> > Pulsar Function runners).
>> > Introducing OpenTelemetry (OTel) solves also the biggest pain point from
>> > the developer perspective, since it's a superb metrics library offering
>> > everything you need, and there is going to be a single way - only it.
>> Also,
>> > it solves the robustness for Plugin author which will use
>> OpenTelemetry. It
>> > so happens that it also solves all the numerous problems described in
>> the
>> > doc itself.
>> >
>> > The solution will be introduced as another layer with feature toggles,
>> so
>> > you can work with existing system, and/or OTel, until gradually
>> deprecating
>> > existing system.
>> >
>> > It's a big breaking change for Pulsar users on many fronts: names,
>> > semantics, configuration. Read at the end of this doc to learn exactly
>> what
>> > will change for the user (in high level).
>> >
>> > In my opinion, it will make Pulsar user experience so much better, they
>> > will want to migrate to it, despite the breaking change.
>> >
>> > This was a very short summary. You are most welcomed to read the full
>> > design document below and express feedback, so we can make it better.
>> >
>> > On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <asaf.mes...@gmail.com>
>> wrote:
>> >
>> >>
>> >>
>> >> On Sun, May 7, 2023 at 4:23 PM Yunze Xu <y...@streamnative.io.invalid>
>> >> wrote:
>> >>
>> >>> I'm excited to learn much more about metrics when I started reading
>> >>> this proposal. But I became more and more frustrated when I found
>> >>> there is still too much content left even if I've already spent much
>> >>> time reading this proposal. I'm wondering how much time did you expect
>> >>> reviewers to read through this proposal? I just recalled the
>> >>> discussion you started before [1]. Did you expect each PMC member that
>> >>> gives his/her +1 to read only parts of this proposal?
>> >>>
>> >>
>> >> I estimated around 2 hours needed for a reviewer.
>> >> I hate it being so long, but I simply couldn't find a way to downsize
>> it
>> >> more. Furthermore, I consulted with my colleagues including Matteo,
>> but we
>> >> couldn't see a way to scope it down.
>> >> Why? Because once you begin this journey, you need to know how it's
>> going
>> >> to end.
>> >> What I ended up doing, is writing all the crucial details for review in
>> >> the High Level Design section.
>> >> It's still a big, hefty section, but I don't think I can step out or
>> let
>> >> anyone else change Pulsar so invasively without the full extent of the
>> >> change.
>> >>
>> >> I don't think it's wise to read parts.
>> >> I did my very best effort to minimize it, but the scope is simply big.
>> >> Open for suggestions, but it requires reading all the PIP :)
>> >>
>> >> Thanks a lot Yunze for dedicating any time to it.
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>> Let's talk back to the proposal, for now, what I mainly learned and
>> >>> are concerned about mostly are:
>> >>> 1. Pulsar has many ways to expose metrics. It's not unified and
>> confusing.
>> >>> 2. The current metrics system cannot support a large amount of topics.
>> >>> 3. It's hard for plugin authors to integrate metrics. (For example,
>> >>> KoP [2] integrates metrics by implementing the
>> >>> PrometheusRawMetricsProvider interface and it indeed needs much work)
>> >>>
>> >>> Regarding the 1st issue, this proposal chooses OpenTelemetry (OTel).
>> >>>
>> >>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
>> >>> section. It's still frustrating to see no answer. Eventually, I found
>> >>>
>> >>
>> >> OpenTelemetry isn't the solution for large amount of topic.
>> >> The solution is described at
>> >> "Aggregate and Filtering to solve cardinality issues" section.
>> >>
>> >>
>> >>
>> >>> the explanation in the "What we need to fix in OpenTelemetry -
>> >>> Performance" section. It seems that we still need some enhancements in
>> >>> OTel. In other words, currently OTel is not ready for resolving all
>> >>> these issues listed in the proposal but we believe it will.
>> >>>
>> >>
>> >> Let me rephrase "believe" --> we work together with the maintainers to
>> do
>> >> it, yes.
>> >> I am open for any other suggestion.
>> >>
>> >>
>> >>
>> >>>
>> >>> As for the 3rd issue, from the "Integrating with Pulsar Plugins"
>> >>> section, the plugin authors still need to implement the new OTel
>> >>> interfaces. Is it much easier than using the existing ways to expose
>> >>> metrics? Could metrics still be easily integrated with Grafana?
>> >>>
>> >>
>> >> Yes, it's way easier.
>> >> Basically you have a full fledged metrics library objects: Meter,
>> Gauge,
>> >> Histogram, Counter.
>> >> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus format.
>> >> You get namespacing for free with Meter name and version.
>> >> It's way better than current solution and any other library.
>> >>
>> >>
>> >>>
>> >>> That's all I am concerned about at the moment. I understand, and
>> >>> appreciate that you've spent much time studying and explaining all
>> >>> these things. But, this proposal is still too huge.
>> >>>
>> >>
>> >> I appreciate your effort a lot!
>> >>
>> >>
>> >>
>> >>>
>> >>> [1] https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
>> >>> [2]
>> >>>
>> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java
>> >>>
>> >>> Thanks,
>> >>> Yunze
>> >>>
>> >>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com>
>> wrote:
>> >>>>
>> >>>> I'm very appreciative for feedback from multiple pulsar users and
>> devs
>> >>> on
>> >>>> this PIP, since it has dramatic changes suggested and quite extensive
>> >>>> positive change for the users.
>> >>>>
>> >>>>
>> >>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <asaf.mes...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I'm very excited to release a PIP I've been working on in the past
>> 11
>> >>>>> months, which I think will be immensely valuable to Pulsar, which I
>> >>> like so
>> >>>>> much.
>> >>>>>
>> >>>>> PIP: https://github.com/apache/pulsar/issues/20197
>> >>>>>
>> >>>>> I'm quoting here the preface:
>> >>>>>
>> >>>>> === QUOTE START ===
>> >>>>>
>> >>>>> Roughly 11 months ago, I started working on solving the biggest
>> issue
>> >>> with
>> >>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker with
>> a
>> >>> large
>> >>>>> topic count: 10k, 100k, and future support of 1M. This started by
>> >>> mapping
>> >>>>> the existing functionality and then enumerating all the problems I
>> >>> saw (all
>> >>>>> documented in this doc
>> >>>>> <
>> >>>
>> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing
>> >>>>
>> >>>>> ).
>> >>>>>
>> >>>>> This PIP is a parent PIP. It aims to gradually solve (using
>> sub-PIPs)
>> >>> all
>> >>>>> the current metric system's problems and provide the ability to
>> >>> monitor a
>> >>>>> broker with a large topic count, which is currently lacking. As a
>> >>> parent
>> >>>>> PIP, it will describe each problem and its solution at a high level,
>> >>>>> leaving fine-grained details to the sub-PIPs. The parent PIP ensures
>> >>> all
>> >>>>> solutions align and does not contradict each other.
>> >>>>>
>> >>>>> The basic building block to solve the monitoring ability of large
>> >>> topic
>> >>>>> count is aggregating internally (to topic groups) and adding
>> >>> fine-grained
>> >>>>> filtering. We could have shoe-horned it into the existing metric
>> >>> system,
>> >>>>> but we thought adding that to a system already ingrained with many
>> >>> problems
>> >>>>> would be wrong and hard to do gradually, as so many things will
>> >>> break. This
>> >>>>> is why the second-biggest design decision presented here is
>> >>> consolidating
>> >>>>> all existing metric libraries into a single one - OpenTelemetry
>> >>>>> <https://opentelemetry.io/>. The parent PIP will explain why
>> >>>>> OpenTelemetry was chosen out of existing solutions and why it far
>> >>> exceeds
>> >>>>> all other options. I’ve been working closely with the OpenTelemetry
>> >>>>> community in the past eight months: brain-storming this integration,
>> >>> and
>> >>>>> raising issues, in an effort to remove serious blockers to make this
>> >>>>> migration successful.
>> >>>>>
>> >>>>> I made every effort to summarize this document so that it can be
>> >>> concise
>> >>>>> yet clear. I understand it is an effort to read it and, more so,
>> >>> provide
>> >>>>> meaningful feedback on such a large document; hence I’m very
>> grateful
>> >>> for
>> >>>>> each individual who does so.
>> >>>>>
>> >>>>> I think this design will help improve the user experience immensely,
>> >>> so it
>> >>>>> is worth the time spent reading it.
>> >>>>>
>> >>>>>
>> >>>>> === QUOTE END ===
>> >>>>>
>> >>>>>
>> >>>>> Thanks!
>> >>>>>
>> >>>>> Asaf Mesika
>> >>>>>
>> >>>
>> >>
>>
>>

Re: [DISCUSS] PIP-264: Enhanced OTel-based metric system

Reply via email to