lhotari opened a new pull request, #23930:
URL: https://github.com/apache/pulsar/pull/23930

   Fixes #23920
   
   ### Motivation
   
   The current rate limiting implementation in Pulsar broker can cause 
connection timeouts when the state of the AsyncTokenBucket becomes invalid due 
to clock source issues in some virtualized environments where 
`System.nanoTime()` isn't strictly monotonic and consistent across multiple 
CPUs.
   This PR addresses those problems.
   
   ### Modifications
   
   1. **AsyncTokenBucket improvements**:
      - Improved token calculation logic to prevent clock leaps backwards 
causing invalid state
      - Improved token consumption logic while adding more test cases for the 
class (unrelated to the actual problem) to handle negative balance with 
eventual consistency:
        - subtract pendingConsumedToken from new tokens before capping at 
bucket capacity
   
   2. **DefaultMonotonicSnapshotClock enhancements**:
      - Reimplemented using a single dedicated thread to reduce chances for 
clock leaps since a single CPU would be used in most cases unless the OS 
migrates the thread to run to another CPU
      - Added logic to handle clock source leaps (backward/forward) gracefully
      - Added required solution for use case where the consistent value of the 
clock is requested
        - This is not on the hot path, so synchronization and Object monitor 
wait/notify/notifyAll provides a feasible solution.
   
   3. **Rate limiter implementations**:
      - Updated PublishRateLimiterImpl to use proper executor for unthrottling 
(EventLoopGroup will pick a new scheduler each time)
      - Added error handling for producer unthrottling
      - Modified DispatchRateLimiter and SubscribeRateLimiter to use the 
monotonic clock instance which handles the clock source issues
   
   Added test coverage:
   
   1. New AsyncTokenBucketTest cases:
      - Fraction handling and leftover token retention
      - Negative balance with eventual consistency
      - Token bucket size limits
      - Clock source lag scenarios
   
   2. New DefaultMonotonicSnapshotClockTest:
      - Time leap handling (backward/forward)
      - Snapshot request validation
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc` <!-- Your PR contains doc changes. -->
   - [ ] `doc-required` <!-- Your PR changes impact docs and you will update 
later -->
   - [x] `doc-not-needed` <!-- Your PR changes do not impact docs -->
   - [ ] `doc-complete` <!-- Docs have been already added -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to