I'm excited to learn much more about metrics when I started reading
this proposal. But I became more and more frustrated when I found
there is still too much content left even if I've already spent much
time reading this proposal. I'm wondering how much time did you expect
reviewers to read through this proposal? I just recalled the
discussion you started before [1]. Did you expect each PMC member that
gives his/her +1 to read only parts of this proposal?

Let's talk back to the proposal, for now, what I mainly learned and
are concerned about mostly are:
1. Pulsar has many ways to expose metrics. It's not unified and confusing.
2. The current metrics system cannot support a large amount of topics.
3. It's hard for plugin authors to integrate metrics. (For example,
KoP [2] integrates metrics by implementing the
PrometheusRawMetricsProvider interface and it indeed needs much work)

Regarding the 1st issue, this proposal chooses OpenTelemetry (OTel).

Regarding the 2nd issue, I scrolled to the "Why OpenTelemetry?"
section. It's still frustrating to see no answer. Eventually, I found
the explanation in the "What we need to fix in OpenTelemetry -
Performance" section. It seems that we still need some enhancements in
OTel. In other words, currently OTel is not ready for resolving all
these issues listed in the proposal but we believe it will.

As for the 3rd issue, from the "Integrating with Pulsar Plugins"
section, the plugin authors still need to implement the new OTel
interfaces. Is it much easier than using the existing ways to expose
metrics? Could metrics still be easily integrated with Grafana?

That's all I am concerned about at the moment. I understand, and
appreciate that you've spent much time studying and explaining all
these things. But, this proposal is still too huge.

[1] https://lists.apache.org/thread/04jxqskcwwzdyfghkv4zstxxmzn154kf
[2] 
https://github.com/streamnative/kop/blob/master/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/stats/PrometheusMetricsProvider.java

Thanks,
Yunze

On Sun, May 7, 2023 at 5:53 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:
>
> I'm very appreciative for feedback from multiple pulsar users and devs on
> this PIP, since it has dramatic changes suggested and quite extensive
> positive change for the users.
>
>
> On Thu, Apr 27, 2023 at 7:32 PM Asaf Mesika <asaf.mes...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'm very excited to release a PIP I've been working on in the past 11
> > months, which I think will be immensely valuable to Pulsar, which I like so
> > much.
> >
> > PIP: https://github.com/apache/pulsar/issues/20197
> >
> > I'm quoting here the preface:
> >
> > === QUOTE START ===
> >
> > Roughly 11 months ago, I started working on solving the biggest issue with
> > Pulsar metrics: the lack of ability to monitor a pulsar broker with a large
> > topic count: 10k, 100k, and future support of 1M. This started by mapping
> > the existing functionality and then enumerating all the problems I saw (all
> > documented in this doc
> > <https://docs.google.com/document/d/1vke4w1nt7EEgOvEerPEUS-Al3aqLTm9cl2wTBkKNXUA/edit?usp=sharing>
> > ).
> >
> > This PIP is a parent PIP. It aims to gradually solve (using sub-PIPs) all
> > the current metric system's problems and provide the ability to monitor a
> > broker with a large topic count, which is currently lacking. As a parent
> > PIP, it will describe each problem and its solution at a high level,
> > leaving fine-grained details to the sub-PIPs. The parent PIP ensures all
> > solutions align and does not contradict each other.
> >
> > The basic building block to solve the monitoring ability of large topic
> > count is aggregating internally (to topic groups) and adding fine-grained
> > filtering. We could have shoe-horned it into the existing metric system,
> > but we thought adding that to a system already ingrained with many problems
> > would be wrong and hard to do gradually, as so many things will break. This
> > is why the second-biggest design decision presented here is consolidating
> > all existing metric libraries into a single one - OpenTelemetry
> > <https://opentelemetry.io/>. The parent PIP will explain why
> > OpenTelemetry was chosen out of existing solutions and why it far exceeds
> > all other options. I’ve been working closely with the OpenTelemetry
> > community in the past eight months: brain-storming this integration, and
> > raising issues, in an effort to remove serious blockers to make this
> > migration successful.
> >
> > I made every effort to summarize this document so that it can be concise
> > yet clear. I understand it is an effort to read it and, more so, provide
> > meaningful feedback on such a large document; hence I’m very grateful for
> > each individual who does so.
> >
> > I think this design will help improve the user experience immensely, so it
> > is worth the time spent reading it.
> >
> >
> > === QUOTE END ===
> >
> >
> > Thanks!
> >
> > Asaf Mesika
> >

Reply via email to