lhotari commented on PR #24300:
URL: https://github.com/apache/pulsar/pull/24300#issuecomment-3616721496

   > In my opinion, a one-second window is an acceptable limitation, even if a 
burst of messages occurs during that second it is still **one second**.
   
   That's true, but the end result isn't that the distance between 2 different 
snapshots would be with 1 second difference in rate since it's not 
deterministic due to the request-response required for creating the snapshot.
   
   > The problem is that we evict old snapshots from the head of the queue (the 
oldest positions), so **slow consumers** can no longer move their `markDelete` 
position because the required snapshot is already gone.
   
   I agree that it doesn't make sense to evict the head of the queue.
   
   > **_Perhaps_** if we added **an option** to make the cache act as a buffer 
(skipping all new entries once it’s full) it would help slow subscriptions stay 
in sync. Moreover, it would be **_generic approach_**
   
   The problem with that approach is that there would later be problems in 
updating the mark delete position since there would be a long duration where 
there wouldn't be snapshots at all. That's the reason why I think that it makes 
more sense to have an eviction algorithm that keeps snapshots with similar 
distances between the first entry and the last entry in the queue. It's not 
very hard to implement such an eviction algorithm. Together with a much larger 
maximum cache size (or even dynamic max size), it's possible to address the 
problem that has been caused by the snapshot cache overflowing.
   
   > After a successful advanced `markDelete` position, the buffer would be 
cleared up to some point and refilled with new 
`ReplicatedSubscriptionsSnapshots`.
   
   Yes, it will eventually catch up as long as the head of the queue isn't 
deleted. However, there's a potential to fix the complete problem of having a 
long lag in the update when the eviction is improved and there are a lot more 
snapshots available.
   
   > The queue that is aligned to a `markDelete` position is more useful for 
synchronization than a queue that follows the tip of the topic.
   
   yes
   
   > A remaining issue is that a slow consumer can suddenly advance its 
`markDelete` position by a large amount, wiping the entire buffer. The next 
snapshots will then be captured far ahead of the consumer’s new `markDelete`, 
leaving a gap.
   
   yes. that's what could be addressed with the improved eviction algorithm 
that would keep snapshots with about equal distances from the head to the tail.
   
   > But we can combine the best of both worlds. By adding the `private final 
NavigableMap<Position, ReplicatedSubscriptionsSnapshot> tailSnapshots; ` and 
`private final NavigableMap<Position, ReplicatedSubscriptionsSnapshot> 
headSnapshots; ` to a `ReplicatedSubscriptionSnapshotCache`
   > 
   > It's not a perfect solution. But it's less evil than rewriting everything 
and still better than the current one.
   
   true
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to