Bryan Beaudreault created HBASE-27170:
-----------------------------------------

             Summary: Potential ByteBuffAllocator leak results in exhaustion of 
pool
                 Key: HBASE-27170
                 URL: https://issues.apache.org/jira/browse/HBASE-27170
             Project: HBase
          Issue Type: Bug
    Affects Versions: 2.4.6
         Environment: 2.4.6 with backported patches (can list if desired). 
Running with temurinjdk11.0.12+7
            Reporter: Bryan Beaudreault


Recently I started testing out BucketCache on some of our new hbase2 clusters. 
When BucketCache is enabled, it causes all disk reads to use ByteBuffAllocator 
in an attempt to avoid heap allocations. Without BucketCache enabled, even with 
ByteBuffAllocator enabled it will not be used for disk reads.

At first this was amazing, we had close to 0% heap allocations which 
drastically reduces CPU and GC time. Over time I noticed that the 
ByteBuffAllocator pool filled up, and at that point all allocations come from 
the heap and our heap allocation % goes to 100%.

We were using default max buffer count, which was 4096 for a smaller host and 
7680 for a larger one. At first I figured we just needed more buffers, so I 
upped it to 120k. It took longer, but still eventually exhausts the pool.

This does not cause an OOM, because I made sure to allocate enough direct 
memory for max buffer count * buffer size (65k). It just causes the 
usedBufCount to exceed maxBufCount, resulting in 100% heap allocations going 
forward. It never recovers from this state until the server is restarted.

Some early observations:
 * Running a major compaction causes a drastic up-tick in used buffers. Major 
compacting a 1.5GB region could easily expand the usedBufCount by ~10,000. Most 
of those are never recovered. I could keep compacting the same region over and 
over, each time increasing the usedBufCount by 5000-15000.
 * Despite usedBufCount increasing, direct memory usage largely does not 
increase. This indicates to me that the DirectByteBuffers are being reclaimed 
by GC, but the Recycler is not being called.
 ** This was confirmed with a heap dump, which showed really no obvious leak in 
the typical sense. There were very very few DirectByteBuffers with capacity 
66560.
 * Enabling BucketCache triggers use of ByteBuffAllocator, but I don't think 
it's related to the actual problem. First of all, compactions skip the cache so 
that would be odd. Secondly, I disabled hbase.block.data.cacheonread and all 
other CacheConfig, so the BucketCache is going unused (0 bytes used) and the 
problem still persists.

Digging deeper, I instrumented ByteBuffAllocator in two ways:
 # I added some trace loggings in getBuffer() and putbackBuffer() so I could 
see the number of allocations, returns, pool size at the time, and a stacktrace.
 # I added netty's ResourceLeakDetector to the RefCnt class we use in 
ByteBuffAllocator. Calling track() on creation of a RefCnt, record() in 
RefCnt.retain(), and close() in RefCnt.deallocate().

The ResourceLeakDetector immediately picked up leaks on start of the 
regionserver. I saw leaks coming from both user requests and, as expected, the 
Compactor. 

I collected 21 unique LEAK stacktraces, which is too many to list. But all of 
them had 2 common roots, so below I post a full stacktrace for 2 examples of 
those roots:
{code:java}
Created at:        
org.apache.hadoop.hbase.nio.RefCnt.<init>(RefCnt.java:58)        
org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:53)        
org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:597)        
org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:350)
        
org.apache.hadoop.hbase.io.hfile.HFileBlock.allocateBuffer(HFileBlock.java:671) 
       org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:640)  
      
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1340)
        
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$CellBasedKeyBlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:331)
        
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:679)
        
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:631)
        
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:315)
        
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
        
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417)
        
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:257) 
       
org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2152)     
   org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2142)     
   
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:7211)
        
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:7191)
        
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:3160)
        
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3140)      
  org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3122)    
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:3116)  
      
org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2657) 
       
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:807)
        
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2898)
        
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45961)
        org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)        
org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:134)        
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:358)       
 org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338) 
{code}
and
{code:java}
Created at:        
org.apache.hadoop.hbase.nio.RefCnt.<init>(RefCnt.java:58)        
org.apache.hadoop.hbase.nio.RefCnt.create(RefCnt.java:53)        
org.apache.hadoop.hbase.nio.ByteBuff.wrap(ByteBuff.java:597)        
org.apache.hadoop.hbase.io.ByteBuffAllocator.allocate(ByteBuffAllocator.java:350)
        
org.apache.hadoop.hbase.io.hfile.HFileBlock.allocateBuffer(HFileBlock.java:671) 
       org.apache.hadoop.hbase.io.hfile.HFileBlock.unpack(HFileBlock.java:640)  
      
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1340)
        
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:968)
        
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:957)
        
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:322)
        
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:216)
        
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:417)
        
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:324) 
       
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:290) 
       
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:513)
        
org.apache.hadoop.hbase.regionserver.compactions.Compactor$1.createScanner(Compactor.java:254)
        
org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Compactor.java:333)
        
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:65)
        
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:126)
        org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1510)   
     org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:2390)    
    
org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.doCompaction(CompactSplit.java:617)
        
org.apache.hadoop.hbase.regionserver.CompactSplit$CompactionRunner.run(CompactSplit.java:659)
        
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        java.base/java.lang.Thread.run(Thread.java:829) {code}
My trace logging gave some insight as well:
 * It showed very similar stacktraces to above for the allocations
 * It also confirmed that we are indeed returning many buffers to the pool, 
coming from calls to Shipper.shipped() in Compactor and ServerCall.setResponse 
for user requests.
 * However, the count allocations vs returns further validated the leak. We 
might have 150k allocations but only 140k returns (as an example).
 * Similarly, the pool size (buffers.size()) was always less than 1000, 
sometimes close to 0. Once the usedBufCount > maxBufCount, the pool size 
shrinks to 0 and stays at 0 forever.

Things I've tried:
 * I've been pouring over the stacktraces, trying to see how we might leak. 
Certainly we do some extra allocations when unpacking compressed blocks, but it 
seems like we always release one. I also tried picking through from bottom to 
top, trying to trace that we're closing everything and it seems like we are. 
This code is complicated though so I easily could have missed something.
 * I tried setting COMPRESSION => NONE and compacting a region in that table, 
it still caused a massive expansion/leak of buffers.
 * We don't fully use cell block encoding, and I wondered if there were some 
assumptions about that in regards to when the blocks are released in 
ServerCall. I enabled cell block encoding for all clients of a cluster and the 
leak still occurred. It's worth noting our client is version 1.2, which doesn't 
support cell block encoding for plain Gets. So we aren't fully enabled, but I 
read through the ServerCall code and it seems like we don't make assumptions.

I've tried digging into the stacktraces above and haven't yet found an obvious 
leak. I tried turning setting COMPRESSION => NONE, but compacting a region in 
that table still causes a massive leak.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to