Apurva007 commented on code in PR #21635:
URL: https://github.com/apache/pulsar/pull/21635#discussion_r1419708455


##########
pip/pip-320.md:
##########
@@ -0,0 +1,241 @@
+# PIP-320 OpenTelemetry Scaffolding 
+
+# Background knowledge
+
+## PIP-264 - parent PIP titled "Enhanced OTel-based metric system"
+[PIP-264](https://github.com/apache/pulsar/pull/21080), which can also be 
viewed [here](pip-264.md), describes in high 
+level a plan to greatly enhance Pulsar metric system by replacing it with 
[OpenTelemetry](https://opentelemetry.io/).
+You can read in the PIP the numerous existing problems PIP-264 solves. Among 
them are:
+- Control which metrics to export per topic/group/namespace via the 
introduction of a metric filter configuration
+- Reduce the immense metrics cardinality due to high topic count (One of 
Pulsar great features), by introducing
+the concept of Metric Group - a group of topics for metric purposes. Metric 
reporting will also be done to a 
+group granularity. 100k topics can be downsized to 1k groups. The dynamic 
metric filter configuration would allow 
+the user to control which metric group to un-filter. 
+- Proper histogram exporting
+- Clean-up codebase clutter, by relying on a single industry standard API, SDK 
and metrics protocol (OTLP) instead of 
+existing mix of home-brew libraries and hard coded Prometheus exporter.
+- any many more
+
+You can [here](pip-264.md#why-opentelemetry) why OpenTelemetry was chosen.
+
+## OpenTelemetry
+Since OpenTelemetry (a.k.a. OTel) is an emerging industry standard, there are 
plenty of good articles, videos and
+documentation about it. In this very short paragraph I'll describe what you 
need to know about OTel from this PIP
+perspective.
+
+OpenTelemetry is a project aimed to standardize the way we instrument, collect 
and ship metrics from applications
+to telemetry backends, be it databases (e.g. Prometheus, Cortex, Thanos) or 
vendors (e.g. Datadog, Logz.io).
+It is divided into API, SDK and Collector:
+- API: interfaces to use to instrument: define a counter, record values to a 
histogram, etc.
+- SDK: a library, available in many languages, implementing the API, and other 
important features such as
+reading the metrics and exporting it out to a telemetry backend or OTel 
Collector. 
+- Collector: a lightweight process (application) which can receive or retrieve 
telemetry, transform it (e.g.
+filter, drop, aggregate)  and export it (e.g. send it to various backends). 
The SDK supports out-of-the-box 
+exporting metrics as Prometheus HTTP endpoint or sending them out using OTLP 
protocol. Many times companies choose to
+ship to the Collector and there ship to their preferred vendors, since each 
vendor already published their exporter
+plugin to OTel Collector. This makes the SDK exporters very light-weight as 
they don't need to support any 
+vendor. It's also easier for the DevOps team as they can make OTel Collector 
their responsibility, and have
+application developers only focus on shipping metrics to that collector.
+
+Just to have some context: Pulsar codebase will use the OTel API to create 
counters / histograms and records values to 
+them. So will the Pulsar plugins and Pulsar Function authors. Pulsar itself 
will be the one creating the SDK
+and using that to hand over an implementation of the API where ever needed in 
Pulsar. Collector is up to the choice
+of the user, as OTel provides a way to expose the metrics as `/metrics` 
endpoint on a configured port, so Prometheus
+compatible scrapers can grab it from it directly. They can also send it via 
OTLP to OTel collector.
+
+## Telemetry layers
+PIP-264 clearly outlined there will be two layers of metrics, collected and 
exported, side by side: OpenTelemetry 
+and the existing metric system - currently exporting in Prometheus. This PIP 
will explain in detail how it will work. 
+The basic premise is that you will be able to enable or disable OTel metrics, 
alongside the existing Prometheus 

Review Comment:
   If both prometheus and OTEL can coexist if Otel is enabled, then will it 
cause a memory increase? if yes, then please can you clarify if after an 
initial verification, is it possible to disable prometheus while Otel is 
enabled? There is a config called "exposeBundlesMetricsInPrometheus", but I am 
not sure if it disables all metrics collection irrespective of prometheus and 
Otel.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to