lhotari commented on PR #24300: URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3612093469
> We could introduce an additional option similar to `replicatedSubscriptionsSnapshotFrequencyMillis`, but for message count – for example `replicatedSubscriptionsSnapshotFrequencyMessageGap` (the naming can be discussed later, along with the semantics of how the two properties interact), and send `ReplicatedSubscriptionsSnapshotRequest` not only based on time but also after a certain number of messages have been received. This way, we could achieve intervals between snapshots that are equal in terms of message count. Yes, a position interval based approach could be useful in ensuring that there would be more frequent snapshots when a lot of messages have been produced. One detail is that the snapshotting isn't deterministic so it wouldn't result in equal intervals between the snapshots. * Since the snapshot require a request-response flow, it's not deterministic. The remote side might be offline or the geo-replication processing of the messages might be slow or vary. There wouldn't be a stable interval between the completed snapshots because of these differences. * There could also be more than 2 clusters in geo-replication and that requires [2 rounds of request-response](https://github.com/apache/pulsar/blob/1ce7855c9424b23ac357cfd1cfe89bdb6e22ea57/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/ReplicatedSubscriptionsSnapshotBuilder.java#L109-L117). This increases variance. > However, I still don’t see how this would help when `markDelete` can be far beyond the horizon of the current cache. The cache would keep snapshots with equal position intervals. I don't see why it wouldn't help. Increasing the snapshot cache size would also help. It's not that much memory that it requires. Optimizing the memory usage further would be useful. One optimization target would be to de-duplicate the java.lang.String instances of the cluster ids that are part of the snapshots. `org.apache.pulsar.common.util.StringInterner` class can be used for deduplication. > An additional option to solution would be to store the `ReplicatedSubscriptionSnapshotCache` somewhere in an **off-heap storage**, for example in **RocksDB**, evicting entries that are older than the `markDelete` horizon. This shouldn't be needed since the amount of memory that the snapshots consume can be optimized. It should be possible to store millions of snapshots with fairly low heap memory usage after optimizing the data structures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
