lhotari commented on PR #24300: URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3616721496
> In my opinion, a one-second window is an acceptable limitation, even if a burst of messages occurs during that second it is still **one second**. That's true, but the end result isn't that the distance between 2 different snapshots would be with 1 second difference in rate since it's not deterministic due to the request-response required for creating the snapshot. > The problem is that we evict old snapshots from the head of the queue (the oldest positions), so **slow consumers** can no longer move their `markDelete` position because the required snapshot is already gone. I agree that it doesn't make sense to evict the head of the queue. > **_Perhaps_** if we added **an option** to make the cache act as a buffer (skipping all new entries once it’s full) it would help slow subscriptions stay in sync. Moreover, it would be **_generic approach_** The problem with that approach is that there would later be problems in updating the mark delete position since there would be a long duration where there wouldn't be snapshots at all. That's the reason why I think that it makes more sense to have an eviction algorithm that keeps snapshots with similar distances between the first entry and the last entry in the queue. It's not very hard to implement such an eviction algorithm. Together with a much larger maximum cache size (or even dynamic max size), it's possible to address the problem that has been caused by the snapshot cache overflowing. > After a successful advanced `markDelete` position, the buffer would be cleared up to some point and refilled with new `ReplicatedSubscriptionsSnapshots`. Yes, it will eventually catch up as long as the head of the queue isn't deleted. However, there's a potential to fix the complete problem of having a long lag in the update when the eviction is improved and there are a lot more snapshots available. > The queue that is aligned to a `markDelete` position is more useful for synchronization than a queue that follows the tip of the topic. yes > A remaining issue is that a slow consumer can suddenly advance its `markDelete` position by a large amount, wiping the entire buffer. The next snapshots will then be captured far ahead of the consumer’s new `markDelete`, leaving a gap. yes. that's what could be addressed with the improved eviction algorithm that would keep snapshots with about equal distances from the head to the tail. > But we can combine the best of both worlds. By adding the `private final NavigableMap<Position, ReplicatedSubscriptionsSnapshot> tailSnapshots; ` and `private final NavigableMap<Position, ReplicatedSubscriptionsSnapshot> headSnapshots; ` to a `ReplicatedSubscriptionSnapshotCache` > > It's not a perfect solution. But it's less evil than rewriting everything and still better than the current one. true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
