mrtworo opened a new issue, #4596:
URL: https://github.com/apache/bookkeeper/issues/4596
**BUG REPORT**
***Describe the bug***
Bookie compaction halts without any errors or diagnostic information, even
with the DEBUG log level enabled.
We observed that every time it silently stopped, it was at entry log 1884,
corresponding to the file 75c.log.
```
# {pod="pulsar-bookie-4",namespace="pulsar"}
2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] DEBUG
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage -
Ledger exists. ledger: 18113 : true
2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] DEBUG
org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage -
Ledger exists. ledger: 18120 : true
2025-04-22T09:12:37,546+0000 [GarbageCollectorThread-6-1] INFO
org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta
from entryLogId: 1884
2025-04-22T09:12:37,547+0000 [GarbageCollectorThread-6-1] DEBUG
org.apache.bookkeeper.bookie.DefaultEntryLogger - Recovering ledgers maps for
log 1884 at offset: 282035685
2025-04-22T09:12:38,518+0000 [GarbageCollectorThread-6-1] INFO
org.apache.bookkeeper.bookie.GarbageCollectorThread -
GarbageCollectorThread-6-1 Set forceGarbageCollection to false after force GC
to make it forceGC-able again.
```
When the entry log 75c.log was moved to a different location and compaction
was manually triggered, the bookie successfully reclaimed storage.
Naively checking the consistency of the affected file via:
```/pulsar/bin/bookkeeper shell readlog 1884```
did not reveal any obvious problems with the metadata.
We couldn't find any relevant information at the metrics level either.
The problem occurred again on a different bookie at a different time, with
exactly the same symptoms and resolution.
***To Reproduce***
We cannot reliably reproduce the issue. The only noteworthy observation is
that both impacted entry log files were closed during a bookie shutdown.
***Expected behavior***
1. Compaction should not stop due to problems with a single file.
2. If compaction halts, it should provide diagnostic information to help
identify the issue.
3. There should be a CLI tool or command to check the consistency of an
entry log file (if such a tool exists, it may not be well-documented).
***Screenshots***
N/A
***Additional context***
Environment:
- Pulsar Version: 4.0.0
- Apache Bookie Version: 4.17.1
- Deployment:
Deployed via [Pulsar Helm
Charts](vscode-file://vscode-app/Applications/Visual%20Studio%20Code.app/Contents/Resources/app/out/vs/code/electron-sandbox/workbench/workbench.html)
to Kubernetes with Karpenter (pods may be rescheduled multiple times a day).
Configuration:
```
gcWaitTime: "300000"
isForceGCAllowWhenNoSpace: "true"
majorCompactionInterval: "10800"
majorCompactionThreshold: "0.8"
minorCompactionInterval: "360"
minorCompactionThreshold: "0.2"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]