Denovo1998 commented on code in PR #24928: URL: https://github.com/apache/pulsar/pull/24928#discussion_r2596924122
########## pip/pip-448.md: ########## @@ -0,0 +1,166 @@ +# PIP-448: Topic-level Delayed Message Tracker for Memory Optimization + +# Background knowledge + +In Apache Pulsar, **Delayed Message Delivery** allows producers to specify a delay for a message, ensuring it is not delivered to any consumer until the specified time has passed. This is a useful feature for implementing tasks like scheduled reminders or retry mechanisms with backoff. + +The legacy default mechanism for handling delayed messages is the `InMemoryDelayedDeliveryTracker`. This tracker is instantiated on a *per-subscription* basis within the broker. When a topic has multiple subscriptions, each subscription gets its own independent `InMemoryDelayedDeliveryTracker` instance. + +The consequence of this per-subscription design is that if a delayed message is published to a topic with 'N' subscriptions, that message's metadata (its position) is stored 'N' times in the broker's memory. This leads to significant memory overhead, especially for topics with a large number of subscriptions, as the memory usage scales linearly with the number of subscriptions. + +# Motivation + +The primary motivation for this proposal is to address the high memory consumption caused by the legacy per-subscription delayed message tracking mechanism. For topics with hundreds or thousands of subscriptions, the memory footprint for delayed messages can become prohibitively large. Each delayed message's position is duplicated across every subscription's tracker, leading to a memory usage pattern of `O(num_delayed_messages * num_subscriptions)`. + +This excessive memory usage can cause: +* Increased memory pressure on Pulsar brokers. +* More frequent and longer Garbage Collection (GC) pauses, impacting broker performance. +* Potential OutOfMemoryErrors, leading to broker instability. +* Limited scalability for use cases that rely on many subscriptions per topic, such as IoT or large-scale microservices with shared subscriptions. + +By introducing an alternative, topic-level tracking mechanism, we can provide a memory-efficient solution to enhance broker stability and scalability for these critical use cases. + +# Goals + +## In Scope +* Introduce a new, optional, topic-level delayed message tracker that is shared across all subscriptions of a single topic. This will store each delayed message's position only once. +* Significantly reduce the memory footprint for delayed message handling when this new tracker is enabled, changing the memory complexity from `O(num_delayed_messages * num_subscriptions)` to `O(num_delayed_messages)`. +* Provide new configuration options to allow operators to tune the behavior of the new tracker, such as pruning intervals and cleanup delays. +* Maintain the existing `DelayedDeliveryTracker` interface to ensure seamless integration with the dispatcher logic. +* Preserve the existing per-subscription `InMemoryDelayedDeliveryTrackerFactory` as the default for backward compatibility, requiring operators to opt-in to use the new topic-level tracker. + +## Out of Scope +* This proposal does not modify the persistent, bucket-based delayed delivery tracker (`BucketDelayedDeliveryTracker`). +* No changes will be made to the public-facing client APIs, REST APIs, or the wire protocol. This is a broker-internal optimization. +* The semantic behavior of delayed messages from a user's perspective will remain identical. + +# High Level Design + +The core idea is to introduce a new, opt-in `DelayedDeliveryTrackerFactory` that implements a shared, topic-level tracking strategy. This is achieved with two new components: a `TopicDelayedDeliveryTrackerManager` and a subscription-scoped `InMemoryTopicDelayedDeliveryTracker`. Review Comment: What you mean is why not to implement DelayedDeliveryTracker directly, but to add a TopicDelayedDeliveryTrackerManager? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
