lhotari commented on PR #24300:
URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3612093469

   > We could introduce an additional option similar to 
`replicatedSubscriptionsSnapshotFrequencyMillis`, but for message count – for 
example `replicatedSubscriptionsSnapshotFrequencyMessageGap` (the naming can be 
discussed later, along with the semantics of how the two properties interact), 
and send `ReplicatedSubscriptionsSnapshotRequest` not only based on time but 
also after a certain number of messages have been received. This way, we could 
achieve intervals between snapshots that are equal in terms of message count.
   
   Yes, a position interval based approach could be useful in ensuring that 
there would be more frequent snapshots when a lot of messages have been 
produced. One detail is that the snapshotting isn't deterministic so it 
wouldn't result in equal intervals between the snapshots. 
   * Since the snapshot require a request-response flow, it's not 
deterministic. The remote side might be offline or the geo-replication 
processing of the messages might be slow or vary. There wouldn't be a stable 
interval between the completed snapshots because of these differences.
   * There could also be more than 2 clusters in geo-replication and that 
requires [2 rounds of 
request-response](https://github.com/apache/pulsar/blob/1ce7855c9424b23ac357cfd1cfe89bdb6e22ea57/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/ReplicatedSubscriptionsSnapshotBuilder.java#L109-L117).
 This increases variance. 
   
   > However, I still don’t see how this would help when `markDelete` can be 
far beyond the horizon of the current cache.
   
   The cache would keep snapshots with equal position intervals. I don't see 
why it wouldn't help. 
   Increasing the snapshot cache size would also help. It's not that much 
memory that it requires. Optimizing the memory usage further would be useful. 
One optimization target would be to de-duplicate the java.lang.String instances 
of the cluster ids that are part of the snapshots. 
`org.apache.pulsar.common.util.StringInterner` class can be used for 
deduplication.
   
   > An additional option to solution would be to store the 
`ReplicatedSubscriptionSnapshotCache` somewhere in an **off-heap storage**, for 
example in **RocksDB**, evicting entries that are older than the `markDelete` 
horizon.
   
   This shouldn't be needed since the amount of memory that the snapshots 
consume can be optimized. It should be possible to store millions of snapshots 
with fairly low heap memory usage after optimizing the data structures.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to