Bumping, as I have received only 1 partial feedback so far on the PIP. Any feedback is highly appreciated!
Thanks! Asaf On Wed, May 10, 2023 at 11:00 AM Asaf Mesika <asaf.mes...@gmail.com> wrote: > > > On Tue, May 9, 2023 at 11:29 PM Dave Fisher <w...@apache.org> wrote: > >> >> >> > On May 8, 2023, at 2:49 AM, Asaf Mesika <asaf.mes...@gmail.com> wrote: >> > >> > Your feedback made me realized I need to add "TL;DR" section, which I >> just >> > added. >> > >> > I'm quoting it here. It gives a brief summary of the proposal, which >> > requires up to 5 min of read time, helping you get a high level picture >> > before you dive into the background/motivation/solution. >> > >> > ---------------------- >> > TL;DR >> > >> > Working with Metrics today as a user or a developer is hard and has many >> > severe issues. >> > >> > From the user perspective: >> > >> > - One of Pulsar strongest feature is "cheap" topics so you can easily >> > have 10k - 100k topics per broker. Once you do that, you quickly >> learn that >> > the amount of metrics you export via "/metrics" (Prometheus style >> endpoint) >> > becomes really big. The cost to store them becomes too high, queries >> > time-out or even "/metrics" endpoint it self times out. >> > The only option Pulsar gives you today is all-or-nothing filtering and >> > very crude aggregation. You switch metrics from topic aggregation >> level to >> > namespace aggregation level. Also you can turn off producer and >> consumer >> > level metrics. You end up doing it all leaving you "blind", looking >> at the >> > metrics from a namespace level which is too high level. You end up >> > conjuring all kinds of scripts on top of topic stats endpoint to glue >> some >> > aggregated metrics view for the topics you need. >> > - Summaries (metric type giving you quantiles like p95) which are used >> > in Pulsar, can't be aggregated across topics / brokers due its >> inherent >> > design. >> > - Plugin authors spend too much time on defining and exposing metrics >> to >> > Pulsar since the only interface Pulsar offers is writing your metrics >> by >> > your self as UTF-8 bytes in Prometheus Text Format to byte stream >> interface >> > given to you. >> > - Pulsar histograms are exported in a way that is not conformant with >> > Prometheus, which means you can't get the p95 quantile on such >> histograms, >> > making them very hard to use in day to day life. >> >> What version of DataSketches is used to produce the histogram? Is is >> still an old Yahoo one, or are we using an updated one from Apache >> DataSketches? >> >> Seems like this is a single PR/small PIP for 3.1? > > > Histograms are a list of buckets, each is a counter. > Summary is a collection of values collected over a time window, which at > the end you get a calculation of the quantiles of those values: p95, p50, > and those are exported from Pulsar. > > Pulsar histogram do not use Data Sketches. They are just counters. > They are not adhere to Prometheus since: > a. The counter is expected to be cumulative, but Pulsar resets each bucket > counter to 0 every 1 min > b. The bucket upper range is expected to be written as an attribute "le" > but today it is encoded in the name of the metric itself. > > This is a breaking change, hence hard to mark in any small release. > This is why it's part of this PIP since so many things will break, and all > of them will break on a separate layer (OTel metrics), hence not break > anyone without their consent. > > > >> >> >> > - Too many metrics are rates which also delta reset every interval you >> > configure in Pulsar and restart, instead of relying on cumulative >> (ever >> > growing) counters and let Prometheus use its rate function. >> > - and many more issues >> > >> > From the developer perspective: >> > >> > - There are 4 different ways to define and record metrics in Pulsar: >> > Pulsar own metrics library, Prometheus Java Client, Bookkeeper metrics >> > library and plain native Java SDK objects (AtomicLong, ...). It's very >> > confusing for the developer and create inconsistencies for the end >> user >> > (e.g. Summary for example is different in each). >> > - Patching your metrics into "/metrics" Prometheus endpoint is >> > confusing, cumbersome and error prone. >> > - many more >> > >> > This proposal offers several key changes to solve that: >> > >> > - Cardinality (supporting 10k-100k topics per broker) is solved by >> > introducing a new aggregation level for metrics called Topic Metric >> Group. >> > Using configuration, you specify for each topic its group (using >> > wildcard/regex). This allows you to "zoom" out to a more detailed >> > granularity level like groups instead of namespaces, which you >> control how >> > many groups you'll have hence solving the cardinality issue, without >> > sacrificing level of detail too much. >> > - Fine-grained filtering mechanism, dynamic. You'll have rule-based >> > dynamic configuration, allowing you to specify per >> namespace/topic/group >> > which metrics you'd like to keep/drop. Rules allows you to set the >> default >> > to have small amount of metrics in group and namespace level only and >> drop >> > the rest. When needed, you can add an override rule to "open" up a >> certain >> > group to have more metrics in higher granularity (topic or even >> > consumer/producer level). Since it's dynamic you "open" such a group >> when >> > you see it's misbehaving, see it in topic level, and when all >> resolved, you >> > can "close" it. A bit similar experience to logging levels in Log4j or >> > Logback, that you default and override per class/package. >> > >> > Aggregation and Filtering combined solves the cardinality without >> > sacrificing the level of detail when needed and most importantly, you >> > determine which topic/group/namespace it happens on. >> > >> > Since this change is so invasive, it requires a single metrics library >> to >> > implement all of it on top of; Hence the third big change point is >> > consolidating all four ways to define and record metrics to a single >> one, a >> > new one: OpenTelemtry Metrics (Java SDK, and also Python and Go for the >> > Pulsar Function runners). >> > Introducing OpenTelemetry (OTel) solves also the biggest pain point from >> > the developer perspective, since it's a superb metrics library offering >> > everything you need, and there is going to be a single way - only it. >> Also, >> > it solves the robustness for Plugin author which will use >> OpenTelemetry. It >> > so happens that it also solves all the numerous problems described in >> the >> > doc itself. >> > >> > The solution will be introduced as another layer with feature toggles, >> so >> > you can work with existing system, and/or OTel, until gradually >> deprecating >> > existing system. >> > >> > It's a big breaking change for Pulsar users on many fronts: names, >> > semantics, configuration. Read at the end of this doc to learn exactly >> what >> > will change for the user (in high level). >> > >> > In my opinion, it will make Pulsar user experience so much better, they >> > will want to migrate to it, despite the breaking change. >> > >> > This was a very short summary. You are most welcomed to read the full >> > design document below and express feedback, so we can make it better. >> > >> > On Sun, May 7, 2023 at 7:52 PM Asaf Mesika <asaf.mes...@gmail.com> >> wrote: >> > >> >> >> >> >> >> On Sun, May 7, 2023 at 4:23 PM Yunze Xu <y...@streamnative.io.invalid> >> >> wrote: >> >> >> >>> I'm excited to learn much more about metrics when I started reading >> >>> this proposal. But I became more and more frustrated when I found >> >>> there is still too much content left even if I've already spent much >> >>> time reading this proposal. I'm wondering how much time did you expect >> >>> reviewers to read through this proposal? I just recalled the >> >>> discussion you started before [1]. Did you expect each PMC member that >> >>> gives his/her +1 to read only parts of this proposal? >> >>> >> >> >> >> I estimated around 2 hours needed for a reviewer. >> >> I hate it being so long, but I simply couldn't find a way to downsize >> it >> >> more. Furthermore, I consulted with my colleagues including Matteo, >> but we >> >> couldn't see a way to scope it down. >> >> Why? Because once you begin this journey, you need to know how it's >> going >> >> to end. >> >> What I ended up doing, is writing all the crucial details for review in >> >> the High Level Design section. >> >> It's still a big, hefty section, but I don't think I can step out or >> let >> >> anyone else change Pulsar so invasively without the full extent of the >> >> change. >> >> >> >> I don't think it's wise to read parts. >> >> I did my very best effort to minimize it, but the scope is simply big. >> >> Open for suggestions, but it requires reading all the PIP :) >> >> >> >> Thanks a lot Yunze for dedicating any time to it. >> >> >> >> >> >> >> >> >> >>> >> >>> Let's talk back to the proposal, for now, what I mainly learned and >> >>> are concerned about mostly are: >> >>> 1. Pulsar has many ways to expose metrics. It's not unified and >> confusing. >> >>> 2. The current metrics system cannot support a large amount of topics. >> >>> 3. It's hard for plugin authors to integrate metrics. (For example, >> >>> KoP [2] integrates metrics by implementing the >> >>> PrometheusRawMetricsProvider interface and it indeed needs much work) >> >>> >> >>> Regarding the 1st issue, this proposal chooses OpenTelemetry (OTel). >> >>> >> >>> Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?" >> >>> section. It's still frustrating to see no answer. Eventually, I found >> >>> >> >> >> >> OpenTelemetry isn't the solution for large amount of topic. >> >> The solution is described at >> >> "Aggregate and Filtering to solve cardinality issues" section. >> >> >> >> >> >> >> >>> the explanation in the "What we need to fix in OpenTelemetry - >> >>> Performance" section. It seems that we still need some enhancements in >> >>> OTel. In other words, currently OTel is not ready for resolving all >> >>> these issues listed in the proposal but we believe it will. >> >>> >> >> >> >> Let me rephrase "believe" --> we work together with the maintainers to >> do >> >> it, yes. >> >> I am open for any other suggestion. >> >> >> >> >> >> >> >>> >> >>> As for the 3rd issue, from the "Integrating with Pulsar Plugins" >> >>> section, the plugin authors still need to implement the new OTel >> >>> interfaces. Is it much easier than using the existing ways to expose >> >>> metrics? Could metrics still be easily integrated with Grafana? >> >>> >> >> >> >> Yes, it's way easier. >> >> Basically you have a full fledged metrics library objects: Meter, >> Gauge, >> >> Histogram, Counter. >> >> No more Raw Metrics Provider, writing UTF-8 bytes in Prometheus format. >> >> You get namespacing for free with Meter name and version. >> >> It's way better than current solution and any other library. >> >> >> >> >> >>> >> >>> That's all I am concerned about at the moment. I understand, and >> >>> appreciate that you've spent much time studying and explaining all >> >>> these things. But, this proposal is still too huge. >> >>> >> >> >> >> I appreciate your effort a lot! >> >> >> >> >> >> >> >>> >> >>> [1] https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf >> >>> [2] >> >>> >> https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java >> >>> >> >>> Thanks, >> >>> Yunze >> >>> >> >>> On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com> >> wrote: >> >>>> >> >>>> I'm very appreciative for feedback from multiple pulsar users and >> devs >> >>> on >> >>>> this PIP, since it has dramatic changes suggested and quite extensive >> >>>> positive change for the users. >> >>>> >> >>>> >> >>>> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <asaf.mes...@gmail.com> >> >>> wrote: >> >>>> >> >>>>> Hi all, >> >>>>> >> >>>>> I'm very excited to release a PIP I've been working on in the past >> 11 >> >>>>> months, which I think will be immensely valuable to Pulsar, which I >> >>> like so >> >>>>> much. >> >>>>> >> >>>>> PIP: https://github.com/apache/pulsar/issues/20197 >> >>>>> >> >>>>> I'm quoting here the preface: >> >>>>> >> >>>>> === QUOTE START === >> >>>>> >> >>>>> Roughly 11 months ago, I started working on solving the biggest >> issue >> >>> with >> >>>>> Pulsar metrics: the lack of ability to monitor a pulsar broker with >> a >> >>> large >> >>>>> topic count: 10k, 100k, and future support of 1M. This started by >> >>> mapping >> >>>>> the existing functionality and then enumerating all the problems I >> >>> saw (all >> >>>>> documented in this doc >> >>>>> < >> >>> >> https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing >> >>>> >> >>>>> ). >> >>>>> >> >>>>> This PIP is a parent PIP. It aims to gradually solve (using >> sub-PIPs) >> >>> all >> >>>>> the current metric system's problems and provide the ability to >> >>> monitor a >> >>>>> broker with a large topic count, which is currently lacking. As a >> >>> parent >> >>>>> PIP, it will describe each problem and its solution at a high level, >> >>>>> leaving fine-grained details to the sub-PIPs. The parent PIP ensures >> >>> all >> >>>>> solutions align and does not contradict each other. >> >>>>> >> >>>>> The basic building block to solve the monitoring ability of large >> >>> topic >> >>>>> count is aggregating internally (to topic groups) and adding >> >>> fine-grained >> >>>>> filtering. We could have shoe-horned it into the existing metric >> >>> system, >> >>>>> but we thought adding that to a system already ingrained with many >> >>> problems >> >>>>> would be wrong and hard to do gradually, as so many things will >> >>> break. This >> >>>>> is why the second-biggest design decision presented here is >> >>> consolidating >> >>>>> all existing metric libraries into a single one - OpenTelemetry >> >>>>> <https://opentelemetry.io/>. The parent PIP will explain why >> >>>>> OpenTelemetry was chosen out of existing solutions and why it far >> >>> exceeds >> >>>>> all other options. I’ve been working closely with the OpenTelemetry >> >>>>> community in the past eight months: brain-storming this integration, >> >>> and >> >>>>> raising issues, in an effort to remove serious blockers to make this >> >>>>> migration successful. >> >>>>> >> >>>>> I made every effort to summarize this document so that it can be >> >>> concise >> >>>>> yet clear. I understand it is an effort to read it and, more so, >> >>> provide >> >>>>> meaningful feedback on such a large document; hence I’m very >> grateful >> >>> for >> >>>>> each individual who does so. >> >>>>> >> >>>>> I think this design will help improve the user experience immensely, >> >>> so it >> >>>>> is worth the time spent reading it. >> >>>>> >> >>>>> >> >>>>> === QUOTE END === >> >>>>> >> >>>>> >> >>>>> Thanks! >> >>>>> >> >>>>> Asaf Mesika >> >>>>> >> >>> >> >> >> >>