QiuYucheng2003 opened a new issue, #4714:
URL: https://github.com/apache/bookkeeper/issues/4714

   **BUG REPORT**
   
   ***Describe the bug***
   A critical "Large Object Retention" (Type I) memory leak exists in 
`LegacyHierarchicalLedgerManager.java`. The class uses a static 
`ThreadLocal<StringBuilder>` named `threadLocalNodeBuilder` to build Zookeeper 
paths to avoid object allocation. 
   
   Inside `getLedgerRangeByLevel()`, the code clears the builder using 
`nodeBuilder.setLength(0)` before reusing it. However, `setLength(0)` only 
resets the character counter; it **does not** shrink or release the underlying 
`char[]` backing array. If a worker thread processes an abnormally long path or 
experiences a burst of data, the internal array dynamically expands to a 
"high-water mark". 
   
   Because `ThreadLocal.remove()` is never invoked, these bloated `char[]` 
arrays are permanently anchored to long-lived worker threads. Over time, they 
promote to the Old Generation, causing severe heap fragmentation and unbounded 
memory expansion.
   
   ***To Reproduce***
   Steps to reproduce the behavior:
   1. Run Apache BookKeeper under high metadata load involving exceptionally 
long Znode paths.
   2. Trigger operations that continuously invoke 
`getLedgerRangeByLevel(level1, level2)` across the thread pool.
   3. The `threadLocalNodeBuilder` expands its internal array to accommodate 
the largest processed path.
   4. Take a JVM Heap Dump and observe large, mostly empty `char[]` instances 
lingering in the Old Generation, strongly referenced by the `ThreadLocalMap` of 
the I/O worker threads.
   
   ***Expected behavior***
   Memory should not be permanently retained by thread pools. To fix this, the 
system should either:
   1. Allocate a local `StringBuilder` inside the method. Modern JVM Escape 
Analysis handles short-lived object allocation extremely efficiently.
   2. If caching is strictly necessary, enforce a maximum capacity limit. If 
the `StringBuilder`'s capacity exceeds a safe threshold (e.g., 4KB) after use, 
it should be discarded and replaced with a new instance rather than blindly 
re-queued.
   
   ***Screenshots***
   Not applicable (Heap dump footprint analysis describes the issue).
   
   ***Additional context***
   This is a well-known anti-pattern in high-throughput environments. Relying 
on `ThreadLocal` to cache dynamically expanding buffers without a strict 
shrinkage policy inevitably leads to physical memory deadlocks in long-lived 
thread pools.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to