Re: [PR] [improve][broker] optimize the problem of subscription snapshot cache not hitting [pulsar]

via GitHub Thu, 04 Dec 2025 04:48:50 -0800


lhotari commented on PR #24300:
URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3612093469

> We could introduce an additional option similar to
`replicatedSubscriptionsSnapshotFrequencyMillis`, but for message count – for
example `replicatedSubscriptionsSnapshotFrequencyMessageGap` (the naming can be
discussed later, along with the semantics of how the two properties interact),
and send `ReplicatedSubscriptionsSnapshotRequest` not only based on time but
also after a certain number of messages have been received. This way, we could
achieve intervals between snapshots that are equal in terms of message count.

Yes, a position interval based approach could be useful in ensuring that
there would be more frequent snapshots when a lot of messages have been
produced. One detail is that the snapshotting isn't deterministic so it
wouldn't result in equal intervals between the snapshots.
* Since the snapshot require a request-response flow, it's not
deterministic. The remote side might be offline or the geo-replication
processing of the messages might be slow or vary. There wouldn't be a stable
interval between the completed snapshots because of these differences.
* There could also be more than 2 clusters in geo-replication and that
requires [2 rounds of
request-response](https://github.com/apache/pulsar/blob/1ce7855c9424b23ac357cfd1cfe89bdb6e22ea57/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/ReplicatedSubscriptionsSnapshotBuilder.java#L109-L117).
This increases variance.

> However, I still don’t see how this would help when `markDelete` can be
far beyond the horizon of the current cache.

The cache would keep snapshots with equal position intervals. I don't see
why it wouldn't help.
Increasing the snapshot cache size would also help. It's not that much
memory that it requires. Optimizing the memory usage further would be useful.
One optimization target would be to de-duplicate the java.lang.String instances
of the cluster ids that are part of the snapshots.
`org.apache.pulsar.common.util.StringInterner` class can be used for
deduplication.

> An additional option to solution would be to store the
`ReplicatedSubscriptionSnapshotCache` somewhere in an **off-heap storage**, for
example in **RocksDB**, evicting entries that are older than the `markDelete`
horizon.

This shouldn't be needed since the amount of memory that the snapshots
consume can be optimized. It should be possible to store millions of snapshots
with fairly low heap memory usage after optimizing the data structures.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [improve][broker] optimize the problem of subscription snapshot cache not hitting [pulsar]

Reply via email to