This is an automated email from the ASF dual-hosted git repository.
yong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git
The following commit(s) were added to refs/heads/master by this push:
new e80d0318cf Log all the error in the GarbageCollectorThread (#4649)
e80d0318cf is described below
commit e80d0318cfdebbc79f37d9120de8df28c8c1c13a
Author: Yong Zhang <[email protected]>
AuthorDate: Tue Aug 12 10:18:44 2025 +0800
Log all the error in the GarbageCollectorThread (#4649)
### Motivation
We met the GarbageCollectionThread was stopped by some runtime error, but
we didn't catch it then, causing the GC to stop.
Such as:
https://github.com/apache/bookkeeper/pull/3901
https://github.com/apache/bookkeeper/pull/4544
In our case, the GC stopped because of the OutOfDirectMemoryException then
the process stopped and the files can not be deleted. But we didn't see any
error logs. This PR enhance the log info when an unhandled error happens.
We already have the [PR](https://github.com/apache/bookkeeper/pull/4544)
fixed that.
And another fix in this PR is to change the Exception to the Throwable in
the getEntryLogMetadata.
Here is the error stack:
```
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate
213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
at
io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
at
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
at
io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
at
io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
at
io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
at
io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
at
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
at
org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
at
org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
at
org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
at
org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
at
org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Unknown Source)
```
You can see it get much more memory used here
extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is
that the header has the wrong data of the header, which should already be fixed
by https://github.com/apache/bookkeeper/pull/4607. Then it reading with a wrong
map size which could take a lot of memory.
---
.../main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java | 2 +-
.../java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java | 4 ++++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
index 34c9f8e7e3..2696aee3c9 100644
---
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
+++
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
@@ -1061,7 +1061,7 @@ public class DefaultEntryLogger implements EntryLogger {
} catch (FileNotFoundException fne) {
LOG.warn("Cannot find entry log file {}.log : {}",
Long.toHexString(entryLogId), fne.getMessage());
throw fne;
- } catch (Exception e) {
+ } catch (Throwable e) {
LOG.info("Failed to get ledgers map index from: {}.log : {}",
entryLogId, e.getMessage());
// Fall-back to scanning
diff --git
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
index dfe31766ad..f0b9aaa6e2 100644
---
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
+++
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
@@ -531,6 +531,10 @@ public class GarbageCollectorThread implements Runnable {
LOG.error("Error in entryLog-metadatamap, Failed to complete
GC/Compaction due to entry-log {}",
e.getMessage(), e);
gcStats.getGcThreadRuntime().registerFailedEvent(MathUtils.elapsedNanos(threadStart),
TimeUnit.NANOSECONDS);
+ } catch (Throwable e) {
+ LOG.error("Error in garbage collector thread, Failed to complete
GC/Compaction due to {}",
+ e.getMessage(), e);
+
gcStats.getGcThreadRuntime().registerFailedEvent(MathUtils.elapsedNanos(threadStart),
TimeUnit.NANOSECONDS);
} finally {
if (force && forceGarbageCollection.compareAndSet(true, false)) {
LOG.info("{} Set forceGarbageCollection to false after force
GC to make it forceGC-able again.",