coderzc commented on code in PR #25113:
URL: https://github.com/apache/pulsar/pull/25113#discussion_r2659659059
##########
pip/pip-451.md:
##########
@@ -0,0 +1,255 @@
+# PIP-451: Support Label-based Topic Subscription
+
+# Motivation
+
+Currently, Apache Pulsar supports Pattern Subscription, which allows consumers
to subscribe to multiple topics using Regular Expressions (Regex). While
powerful, Regex-based subscription has several structural limitations in
complex microservice architectures:
+
+* Coupling: It couples the consumption logic with the topic naming convention.
Changing business requirements often forces topic renaming, which is
operationally expensive and risky.S
+
+* Flexibility: It is difficult to group semantically related but differently
named topics (e.g., persistent://public/default/payment-core and
persistent://public/legacy/billing-v1) into a single subscription without
complex regex wizardry.
+
+* Complexity: complex Regex can be hard to maintain and error-prone.
+
+Label-based Subscription solves these issues by decoupling "Identity" (Topic
Name) from "Attributes" (Labels). Users can attach Key-Value metadata (e.g.,
env=prod, dept=finance) to topics and subscribe by specifying a label selector.
+
+# Goals
+
+## In Scope
+* Management: Allow attaching, updating, and removing Key-Value labels to/from
Topics via the Admin API.
+* Subscription: Allow Consumers to subscribe to topics matching specific
Labels within specified Namespaces. Support cross-namespace subscription via an
explicit namespace list in the Client API, avoiding the complexity of
background metadata polling.
+
+
+
+# High-Level Design
+The design introduces a metadata-driven approach where labels are stored in
TopicPolicies.
+The Broker maintains an in-memory index to map labels to topics. The client
utilizes a "Watch" mechanism to receive real-time updates when topics matching
the labels are created or updated.
+## Key points
+* Storage: Labels are stored as Map<String, String> inside TopicPolicies.
+* Indexing: The Broker maintains an In-Memory Inverted Index (LabelKey ->
LabelValue -> Set<Topic>) per Namespace. This hierarchical structure ensures
efficient lookups for Key-Value pairs without iterating through all topics.
+* Discovery Protocol: We extend the CommandWatchTopicList protocol (PIP-179)
to accept a label_selector.
+* Client Implementation: The Client accepts a list of target namespaces and
manages multiple watchers (one per namespace) to aggregate matching topics.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+### Storage
+
+#### Topic Labels in TopicPolicies
+
+Add a labels field to the TopicPolicies class. Since Topic Policies are
propagated via the __change_events system topic, this ensures durability and
consistency across brokers.
+```java
+public class TopicPolicies {
+ // New field: Key-Value labels
+ private Map<String, String> customLabels;
+}
+```
+
+#### In-Memory Inverted Index for Labels-Topic Mapping
+
+The SystemTopicBasedTopicPoliciesService will maintain a nested map structure
per Namespace to support efficient Key-Value lookups.
+
+**Data Structure**:
+
+```java
+// Map<Namespace, Map<LabelKey, Map<LabelValue, Set<TopicName>>>>
+Map<String, Map<String, Map<String, Set<String>>>> labelTopicInvertedIndex;
Review Comment:
Yes, each broker needs to cache the full amount of labelTopicInvertedIndex.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]