naahk37 opened a new issue, #24816: URL: https://github.com/apache/pulsar/issues/24816
### Search before reporting - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### User environment Pulsar 4.0.2 openjdk version "17.0.14" Ubuntu 24.04 3 brokers, 3 bookies, 3 zookeepers ### Issue Description Hi, I'm running a Pulsar cluster with the specs mentioned above and partitioned topics and I have two problems: - The filesystem usage on the bookies doesn't seem to go down (bk1: 100GB, bk2: 400GB, bk3: 100G). I already set a retention policy on my main namespace (2weeks, 10GB) and the metriks in Grafana report the correct topic sizes (100GB storage size and ~30GB backlog size). I get the usage on bk1 and bk3, but not the 400GB on bk2... - The bookie service on bk2 stops frequently with an error "io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 2147483648, max: 2147483648)". I can't find the setting where I can control the limit for this memory setting. I already increased the jvm allocations in the pulsar_env.sh those don't seem to correlate.. On bookie bk2 there are around 12k ledger log files, whereas on the other two bookies are ~200 and 85 files. On bk2 I have the following log entries regarding GC: "Forced garbage collection triggered by thread: LedgerDirsMonitorThread", "Garbage collector thread forced to perform GC before expiry of wait time", "Extracting entry log meta from entryLogId: 185", "GarbageCollectorThread-6-1 Set forceGarbageCollection to false after force GC to make it forceGC-able again" It looks like GC is happening, because the "deleted ledger" count goes up when the bookie is running. ### Error messages ```text ``` ### Reproducing the issue not really applicable - keeps happening when cluster is running ### Additional information _No response_ ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
