This is an automated email from the ASF dual-hosted git repository.

yong pushed a commit to branch branch-4.16
in repository https://gitbox.apache.org/repos/asf/bookkeeper.git


The following commit(s) were added to refs/heads/branch-4.16 by this push:
     new 35b8f137b4 Log all the error in the GarbageCollectorThread (#4649)
35b8f137b4 is described below

commit 35b8f137b49e730efc3c1c0d5e294d2ff827bd42
Author: Yong Zhang <[email protected]>
AuthorDate: Tue Aug 12 10:18:44 2025 +0800

    Log all the error in the GarbageCollectorThread (#4649)
    
    ### Motivation
    
    We met the GarbageCollectionThread was stopped by some runtime error, but 
we didn't catch it then, causing the GC to stop.
    Such as:
    https://github.com/apache/bookkeeper/pull/3901
    https://github.com/apache/bookkeeper/pull/4544
    
    In our case, the GC stopped because of the OutOfDirectMemoryException then 
the process stopped and the files can not be deleted. But we didn't see any 
error logs. This PR enhance the log info when an unhandled error happens.
    We already have the [PR](https://github.com/apache/bookkeeper/pull/4544) 
fixed that.
    
    And another fix in this PR is to change the Exception to the Throwable in 
the getEntryLogMetadata.
    
    Here is the error stack:
    
    ```
        io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 
213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
            at 
io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
            at 
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
            at 
io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
            at 
io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
            at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
            at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
            at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
            at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
            at 
io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
            at 
io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
            at 
io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
            at 
org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
            at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
            at 
org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
            at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
            at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
            at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
            at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
            at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
            at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
            at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
            at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
            at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
            at java.base/java.lang.Thread.run(Unknown Source)
    ```
    
    You can see it get much more memory used here 
extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is 
that the header has the wrong data of the header, which should already be fixed 
by https://github.com/apache/bookkeeper/pull/4607. Then it reading with a wrong 
map size which could take a lot of memory.
    
    (cherry picked from commit e80d0318cfdebbc79f37d9120de8df28c8c1c13a)
---
 .../main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java    | 2 +-
 .../java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java     | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
index cfe7aac793..a8065411ee 100644
--- 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
+++ 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java
@@ -1062,7 +1062,7 @@ public class DefaultEntryLogger implements EntryLogger {
         } catch (FileNotFoundException fne) {
             LOG.warn("Cannot find entry log file {}.log : {}", 
Long.toHexString(entryLogId), fne.getMessage());
             throw fne;
-        } catch (Exception e) {
+        } catch (Throwable e) {
             LOG.info("Failed to get ledgers map index from: {}.log : {}", 
entryLogId, e.getMessage());
 
             // Fall-back to scanning
diff --git 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
index 41eb1d941f..b1c4b0ab2d 100644
--- 
a/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
+++ 
b/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/GarbageCollectorThread.java
@@ -500,6 +500,10 @@ public class GarbageCollectorThread implements Runnable {
                     e.getMessage(), e);
             gcStats.getGcThreadRuntime().registerFailedEvent(
                     MathUtils.nowInNano() - threadStart, TimeUnit.NANOSECONDS);
+        } catch (Throwable e) {
+            LOG.error("Error in garbage collector thread, Failed to complete 
GC/Compaction due to {}",
+                e.getMessage(), e);
+            
gcStats.getGcThreadRuntime().registerFailedEvent(MathUtils.elapsedNanos(threadStart),
 TimeUnit.NANOSECONDS);
         } finally {
             if (force && forceGarbageCollection.compareAndSet(true, false)) {
                 LOG.info("{} Set forceGarbageCollection to false after force 
GC to make it forceGC-able again.",

Reply via email to