mrtworo opened a new issue, #4596:
URL: https://github.com/apache/bookkeeper/issues/4596

   **BUG REPORT**
   
   ***Describe the bug***
   
   Bookie compaction halts without any errors or diagnostic information, even 
with the DEBUG log level enabled.
   
   We observed that every time it silently stopped, it was at entry log 1884, 
corresponding to the file 75c.log.
   
   ```
   #  {pod="pulsar-bookie-4",namespace="pulsar"} 
   2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] DEBUG 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - 
Ledger exists. ledger: 18113 : true
   2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] DEBUG 
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage - 
Ledger exists. ledger: 18120 : true
   2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta 
from entryLogId: 1884
   2025-04-22T09:12:37,547+0000 [GarbageCollectorThread-6-1] DEBUG 
org.apache.bookkeeper.bookie.DefaultEntryLogger - Recovering ledgers maps for 
log 1884 at offset: 282035685
   2025-04-22T09:12:38,518+0000 [GarbageCollectorThread-6-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - 
GarbageCollectorThread-6-1 Set forceGarbageCollection to false after force GC 
to make it forceGC-able again.
   ```
   
   When the entry log 75c.log was moved to a different location and compaction 
was manually triggered, the bookie successfully reclaimed storage.
   
   Naively checking the consistency of the affected file via:
   ```/pulsar/bin/bookkeeper shell readlog 1884```
   did not reveal any obvious problems with the metadata.
   
   We couldn't find any relevant information at the metrics level either.
   
   The problem occurred again on a different bookie at a different time, with 
exactly the same symptoms and resolution.
   
   ***To Reproduce***
   
   We cannot reliably reproduce the issue. The only noteworthy observation is 
that both impacted entry log files were closed during a bookie shutdown.
   
   ***Expected behavior***
   
   1. Compaction should not stop due to problems with a single file.
   2. If compaction halts, it should provide diagnostic information to help 
identify the issue.
   3. There should be a CLI tool or command to check the consistency of an 
entry log file (if such a tool exists, it may not be well-documented).
   
   ***Screenshots***
   
   N/A
   
   ***Additional context***
   
   Environment: 
   
   - Pulsar Version: 4.0.0
   - Apache Bookie Version: 4.17.1
   - Deployment:
   Deployed via [Pulsar Helm 
Charts](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-sandbox/workbench/workbench.html)
 to Kubernetes with Karpenter (pods may be rescheduled multiple times a day).
   
   Configuration:
   ```  
   gcWaitTime: "300000"
   isForceGCAllowWhenNoSpace: "true"
   majorCompactionInterval: "10800"
   majorCompactionThreshold: "0.8"
   minorCompactionInterval: "360"
   minorCompactionThreshold: "0.2"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to