This is an automated email from the ASF dual-hosted git repository.

lhotari pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 8864e2d6865 [feat][pip] PIP-447: Customizable Prometheus Labels for 
Topic Metrics (#24862)
8864e2d6865 is described below

commit 8864e2d68652761a3abc9d0bd4b2f7487ff7583b
Author: Cong Zhao <[email protected]>
AuthorDate: Fri Jan 2 22:05:36 2026 +0800

    [feat][pip] PIP-447: Customizable Prometheus Labels for Topic Metrics 
(#24862)
---
 pip/pip-447.md | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)

diff --git a/pip/pip-447.md b/pip/pip-447.md
new file mode 100644
index 00000000000..d4cd9c15f62
--- /dev/null
+++ b/pip/pip-447.md
@@ -0,0 +1,230 @@
+# PIP-447: Customizable Prometheus Labels for Topic Metrics
+
+# Abstract
+
+This PIP proposes a mechanism to add customizable Prometheus labels to Apache 
Pulsar topic-level metrics. Administrators will define a set of allowed custom 
metric label keys at the cluster or broker level. Users can then assign values 
to these predefined keys for specific topics. These key-value pairs will be 
exposed as Prometheus labels, enabling more granular metric filtering and 
precise alerting, while managing metric cardinality through centralized key 
governance.
+
+# Motivation
+
+Currently, Pulsar topic metrics exposed to Prometheus have a fixed set of 
labels: cluster, namespace, and topic. This limits users' ability to categorize 
topics for alerting and dashboarding based on custom criteria (e.g., SLA tier, 
data sensitivity, application owner) that are not easily expressed through 
topic names alone. Relying on regular expressions (regex) applied to the topic 
label in Prometheus for such grouping is often complex, inefficient, and 
error-prone.
+
+This limitation can lead to:
+
+Imprecise Alerting: Difficulty in setting distinct alerting thresholds for 
different categories of topics.
+
+Alert Fatigue or Missed Alerts: Overly broad or overly complex alerting rules.
+
+Operational Overhead: Increased effort in managing and maintaining Prometheus 
alert configurations.
+
+Users require a native Pulsar mechanism to inject queryable, custom metadata 
directly into topic metrics to improve alerting precision, simplify 
dashboarding, and enhance overall observability.
+
+# Goals
+
+The proposed solution allows administrators to define a list of permissible 
custom metric label keys (e.g., sla_tier, app_owner) in the broker 
configuration. 
+
+Users can then use pulsar-admin commands or the REST API to set string values 
for these allowed keys on specific topics (e.g., sla_tier=gold). 
+
+These key-value pairs will be stored as part of the topic's metadata, 
leveraging Pulsar's topic-level policy system. 
+
+When topic metrics are generated for Prometheus, these custom key-value pairs 
will be added as labels to the existing set of standard labels.
+
+## In Scope
+
+The primary goals of this proposal are to:
+
+Allow administrators to define a configurable set of allowed custom metric 
label keys at the broker or cluster level.
+
+Enable users (or automated systems) to assign string values to these 
predefined keys for individual topics.
+
+Expose these user-defined key-value pairs as additional Prometheus labels on 
all topic-level metrics for the respective topics, provided 
exposeTopicLevelMetricsInPrometheus is true.
+
+Provide robust control over Prometheus metric cardinality by restricting the 
set of custom metric label keys and providing limits on their usage.
+
+Integrate this feature with Pulsar's existing topic-level policy framework, 
utilizing system topics for policy propagation.
+
+## Out of Scope
+
+This PIP does not propose changes to topic-level properties. Topic-level 
properties are distinct from topic-level custom metrics labels.
+
+It does not aim to replace existing standard metric labels (cluster, 
namespace, topic).
+
+Support for complex data types as label values; only string values will be 
supported.
+
+# High Level Design
+
+API and Data Structure Definition: Finalize the internal data structures for 
storing custom metric labels within topic policies and the public contracts for 
pulsar-admin commands and REST APIs.
+
+Broker Configuration Implementation: Add the new configuration parameters to 
broker.conf and implement the logic for brokers to read and use these settings.
+
+Admin Client and REST API Implementation: Develop the new pulsar-admin topics 
subcommands and their corresponding REST API endpoints in the broker, including 
validation logic against allowedCustomMetricLabelKeys.
+
+Broker Policy Handling Logic:
+
+Extend the topic-level policy framework to handle customMetricLabels.
+
+Ensure changes are published to and consumed from the `__change_events` system 
topic.
+
+Update broker policy caches accordingly.
+
+Metrics Generation Modification: Update the Prometheus metrics servlet (or 
equivalent OTel exporter logic in the future) to retrieve custom metric labels 
from the policy cache and add them to the outgoing topic metrics.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+Topic Policy Storage: Custom metric labels will be stored as a Map<String, 
String> within the topic's policy data structure.
+
+Policy Propagation and Caching: Brokers will use the existing system topic 
mechanism (__change_events) for propagating custom metric label policy changes 
and updating their in-memory policy caches.
+
+### Metrics Generation Logic:
+The Prometheus metrics generation component in the broker (e.g., 
PrometheusMetricsServlet) will be modified.
+
+If `exposeCustomTopicMetricLabelsEnabled` is true, for each topic, it will 
retrieve the customMetricLabels map from its policy cache.
+
+Each key-value pair from this map will be added as an additional label to all 
Prometheus metrics emitted for that topic. The standard labels (cluster, 
namespace, topic) will remain.
+
+Validation: Brokers will enforce maxCustomMetricLabelValueLength during the 
set-custom-metric-labels operation. Keys will be validated against 
allowedCustomMetricLabelKeys.
+
+
+## Public-facing Changes
+
+### Configuration
+
+The following new configuration parameters will be introduced in broker.conf:
+
+exposeCustomTopicMetricLabelsEnabled=(true|false)
+
+Description: Enables or disables the custom topic metric labels feature.
+
+Default: false
+
+allowedCustomMetricLabelKeys=<key1>,<key2>,...
+
+Description: A comma-separated list of strings defining the custom metric 
label keys that administrators allow to be set on topics. Example: 
sla_tier,data_sensitivity,cost_center,app_owner.
+
+Default: Empty string (if the feature is enabled but no keys are defined, no 
custom metric labels can be set).
+
+maxCustomMetricLabelValueLength=<integer>
+
+Description: The maximum character length for a custom metric label value.
+
+Default: 128
+
+### Public API
+
+New pulsar-admin topics subcommands and corresponding REST API endpoints will 
be introduced:
+
+#### Set Custom Metric Labels:
+
+CLI: pulsar-admin topics set-custom-metric-labels <topic-name> --labels 
"key1=value1,key2=value2"
+
+REST API: POST 
/admin/v2/topics/{tenant}/{namespace}/{topic}/custom-metric-labels with a JSON 
payload {"labels": {"key1":"value1", "key2":"value2"}}
+
+Action: Sets or updates custom metric labels for the specified topic. 
+
+The broker (or admin client before sending) will validate that all provided 
keys (e.g., key1, key2) are present in the allowedCustomMetricLabelKeys list 
defined in broker.conf. 
+
+User cannot rewrite a label which already defined from Pulsar.
+
+Invalid keys will result in an error. This operation will update the topic's 
policy and publish a change event to the system topic (__change_events) for 
that namespace.
+
+#### Get Custom Metric Labels:
+
+CLI: pulsar-admin topics get-custom-metric-labels <topic-name>
+
+REST API: GET 
/admin/v2/topics/{tenant}/{namespace}/{topic}/custom-metric-labels
+
+Action: Retrieves the currently set custom metric labels for the topic.
+
+#### Remove Custom Metric Labels:
+
+CLI:
+pulsar-admin topics remove-custom-metric-labels <topic-name> --labels 
"key1,key2" (to remove specific labels)
+pulsar-admin topics remove-custom-metric-labels <topic-name> --all (to remove 
all custom metric labels from the topic)
+
+REST API: DELETE 
/admin/v2/topics/{tenant}/{namespace}/{topic}/custom-metric-labels with a query 
params keys=k1&keys=k2 or all=true.
+
+Action: Removes the specified custom metric labels or all custom metric labels 
from the topic. This also updates the topic policy.
+
+# Backward & Forward Compatibility
+
+## Backward Compatibility
+
+Disabled by Default: The feature will be disabled by default 
(exposeCustomTopicMetricLabelsEnabled=false). 
+
+Existing Pulsar deployments will see no change in behavior or metric format.
+
+No Impact if Unused: If the feature is enabled but 
allowedCustomMetricLabelKeys is not configured or no labels are set on topics, 
metrics will remain unchanged.
+
+Existing APIs: Existing pulsar-admin commands and REST APIs are unaffected. 
The new commands and endpoints are additive.
+
+## Forward Compatibility
+
+Prometheus Systems: If a Pulsar broker with this feature enabled sends metrics 
with custom metric labels to an older Prometheus server or a monitoring system 
not expecting these additional labels, those systems will typically ignore the 
extra labels without issue.
+
+Future Enhancements: Future Pulsar versions could extend this feature, for 
example, by allowing more dynamic management of allowedCustomMetricLabelKeys if 
deemed safe and necessary.
+
+OpenTelemetry Alignment: The key-value structure of custom metric labels 
aligns well with OpenTelemetry attributes, ensuring that this feature remains 
relevant and compatible with Pulsar's evolving metrics infrastructure.
+
+# Testing Plan
+
+A comprehensive testing strategy will be required:
+
+Unit Tests: For new logic in pulsar-admin, REST API handlers, policy 
management, and metrics generation.
+
+Integration Tests:
+
+Verify correct setting, getting, and removing of custom metric labels via 
admin tools and REST APIs.
+
+Test validation logic for allowed keys, max labels per topic, and value length.
+
+Ensure correct propagation of custom metric label policy changes via system 
topics and updates to broker policy caches.
+
+Verify that Prometheus metrics output accurately includes the custom metric 
labels for relevant topics and does not include them when the feature is 
disabled or labels are not set.
+
+End-to-End Tests: Simulate a Pulsar cluster environment, set custom metric 
labels on topics, and scrape metrics using a Prometheus instance to confirm 
labels appear correctly and can be queried.
+
+Performance Tests: Assess any potential performance impact on brokers, 
particularly concerning policy updates and metrics generation, especially in 
environments with a large number of topics.
+
+# Documentation Plan
+
+The official Apache Pulsar documentation will be updated to include:
+
+Concepts Section: Explanation of the custom topic metric labels feature, its 
purpose, and how it helps with monitoring and alerting.
+
+Administrator Guide:
+
+Instructions on how to enable and configure the feature in broker.conf (e.g., 
exposeCustomTopicMetricLabelsEnabled, allowedCustomMetricLabelKeys, and other 
limits).
+
+Best practices for defining allowedCustomMetricLabelKeys with cardinality 
management in mind.
+
+User Guide / pulsar-admin Reference:
+
+Detailed syntax and examples for the new pulsar-admin topics 
set/get/remove-custom-metric-labels commands.
+
+Guidance on choosing appropriate values for predefined keys to manage 
cardinality.
+
+REST API Reference: Documentation for the new REST API endpoints.
+
+Monitoring Section: Notes on how these custom metric labels appear in 
Prometheus and how they can be used in PromQL queries, along with reminders 
about cardinality considerations for the Prometheus system.
+
+# Alternatives
+
+Briefly, two other approaches were considered and rejected:
+
+A. Single Composite Tag Label:
+
+Description: Exposing a list of user-defined tags as a single, comma-separated 
string label (e.g., custom_tags="tagA,tagB,tagC").
+
+Reason for Rejection: This approach can lead to extremely high cardinality of 
the label value itself if tag combinations are diverse. It also necessitates 
complex and less performant regex queries in Prometheus and loses the semantic 
key-value structure.
+
+B. Prometheus Relabeling with External Metadata:
+
+Description: Keeping Pulsar metrics unchanged and using Prometheus's 
relabel_configs to enrich metrics with labels from an external metadata source 
(e.g., a file or a separate API).
+
+Reason for Rejection: This shifts the implementation complexity and 
maintenance burden to the Prometheus configuration and external systems. It 
introduces risks of stale metadata and potential performance overhead on 
Prometheus. Crucially, it is not a Pulsar-native solution, which is the aim of 
this proposal.
+
+# Links
+
+* Mailing List discussion thread: 
https://lists.apache.org/thread/66l8cdhx5f7sv05mqfnlwc7s570frtzq
+* Mailing List voting thread: 
https://lists.apache.org/thread/gotoy8oo2sghsckwtt6zh47obgbvnlzs

Reply via email to