Ksnz commented on PR #24300: URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3611156328
>WDYT? There may be several strategies to address this. And the cache-eviction policy should probably be aware of the namespace, topic, or even individual subscription. The main problem in my current project is that `markDelete` never advances on the delayed-message topic replica. In cluster A the `markDelete` cursor moves only rarely, at unpredictable moments, and over an unpredictable range. This causes the Cluster B replica backlog to grow. To mitigate this, we want to keep at least the oldest snapshot to ensure that we do not lose any `markDelete` update. The following strategies come to mind: 1. **Equal time intervals** (current behavior) — evict from the head. May be used as a default, to not break expected behaviour. 2. **Equal distance intervals** (new strategy) — evict from the head. Requires extra code outside of `ReplicatedSubscriptionSnapshotCache,` because `Position` itself is not trusted source to calculate distance. 3. **Increasing time intervals (from tail to head)** (new strategy) — evict from the middle, keep the head. 4. **Increasing distance intervals (from tail to head)** (new strategy) — evict from the middle, keep the head. Again extra code. 5. **Increasing time intervals (from head to tail)** (new strategy) — evict from the middle, keep the head. 6. **Increasing distance intervals (from head to tail)** (new strategy) — evict from the middle, keep the head. Requires additional code . Policies 5 or 6 fit our needs for delayed-topics. But any distance-based strategies require a lot changes to be implemented. For me the simple way is to implement 3 and 5 strategies. 3 for fast-consumers but sometimes get stuck due to burst of messages. To not loss sync at all. 5 for delayed-messages consumers, where `markDelete` moves intermittently and unpredictably. And an option to pick strategy per namespace. Ability to pick strategy on consumer configuration requires a cascade of code changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
