dlg99 opened a new pull request, #4544:
URL: https://github.com/apache/bookkeeper/pull/4544

   ### Motivation
   
   corrupt entry log file causes OODMEs and stuck GC
   ```
   2025-01-03T00:24:55,037+0000 [GarbageCollectorThread-7-1] INFO  
org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta 
from entryLogId: 45795
   2025-01-03T00:24:55,038+0000 [GarbageCollectorThread-7-1] INFO  
org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index 
from: 45795.log : Negative position
   2025-01-03T00:24:55,039+0000 [GarbageCollectorThread-7-1] ERROR 
org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 1936946533 
byte(s) of direct memory (used: 1140850688, max: 2147483648)
           at 
io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:845)
 ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
           at 
io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:774)
 ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
           at 
io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:701) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at 
io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:690) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:226) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:144) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at io.netty.buffer.PoolArena.reallocate(PoolArena.java:302) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:122) 
~[io.netty-netty-buffer-4.1.86.Final.jar:4.1.86.Final]
           at 
org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1030) 
~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391)
 ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
           at 
org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) 
~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
           at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) 
~[?:?]
           at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
 ~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 
~[?:?]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 
~[?:?]
           at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
 ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
           at java.lang.Thread.run(Thread.java:833) ~[?:?]
   ```
   
   I don't have access to the environment, AFAIK there is enough direct memory 
and other entry logs can be compacted ok.
   I don't know how it got corrupted.
   
   ### Changes
   
   Handle exception, log, skip the file. Similar to 
https://github.com/apache/bookkeeper/pull/3901
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to