[I] [Improve][Offload] Automatic managed ledger offload triggers can create redundant retry loops [pulsar]

via GitHub Fri, 22 May 2026 20:03:14 -0700


void-ptr974 opened a new issue, #25859:
URL: https://github.com/apache/pulsar/issues/25859


   ### Describe the issue
   
   Automatic managed ledger offload can be triggered repeatedly while a 
previous automatic offload is still running, for example around ledger rollover 
or topic load.
   
   Since only one offload can run at a time, repeated automatic triggers do not 
improve the final offload result. Instead, each trigger can independently enter 
the offload path, scan ledgers, fail to acquire the offload mutex, and schedule 
another retry.
   
   ### Problem
   
   Before automatic triggers are coalesced, repeated triggers can perform the 
same work independently:
   
   1. Read offload policies.
   2. Scan the managed ledger ledger list.
   3. Try to acquire the offload mutex.
   4. Fail because another offload is already running.
   5. Schedule another retry after 100ms.
   
   When offload is slow, or when a managed ledger has many ledgers, these 
retries can build up unnecessary scheduler and executor work. They can also 
repeat policy reads and ledger-list scans that do not change the final offload 
result.
   
   ### Expected behavior
   
   Automatic managed ledger offload triggers should be coalesced while an 
automatic offload is already in progress:
   
   - There should be at most one in-flight automatic offload.
   - Repeated automatic triggers during the in-flight offload should be merged 
into one pending rerun.
   - After the current automatic offload completes, one follow-up pass should 
run if any trigger arrived meanwhile.
   - New ledgers that become eligible while an offload is running should still 
be picked up by the follow-up pass.
   - Explicit/manual offload requests should keep the existing 
`CompletableFuture<Position>` behavior.
   
   ### Actual behavior
   
   Each automatic trigger can independently enter the offload path and schedule 
its own retry loop when another offload is already running. Under slow offload 
or large-ledger workloads, this can cause redundant 100ms retries, repeated 
policy reads, repeated ledger-list scans, and unnecessary scheduler/executor 
pressure.
   
   ### Impact
   
   The issue is mostly about avoiding duplicate work and reducing pressure 
while automatic offload is already in progress. The final automatic offload 
progression should be preserved: if new ledgers become eligible while an 
offload is running, a follow-up pass can still offload them.
   
   ### Verification / reproducer
   
   The scenario is covered by tests added in #25793:
   
   - Repeated automatic triggers during an in-flight offload do not create 
independent retry loops.
   - A coalesced automatic trigger causes one follow-up offload pass.
   - Automatic offload state is released when offload thresholds are disabled, 
so later valid triggers can still run.
   
   Local verification from the related PR:
   
   ```bash
   git diff --check
   ./gradlew :managed-ledger:test --tests 
org.apache.bookkeeper.mledger.impl.OffloadPrefixTest
   ./gradlew :managed-ledger:test --tests 
org.apache.bookkeeper.mledger.impl.OffloadLedgerDeleteTest
   ```
   
   ### Affected area
   
   Managed ledger automatic offload scheduling and retry behavior.
   
   ### Related PR
   
   #25793


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Improve][Offload] Automatic managed ledger offload triggers can create redundant retry loops [pulsar]

Reply via email to