[ 
https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932008#comment-16932008
 ] 

Zhao Yi Ming commented on HDFS-14308:
-------------------------------------

We hit the direct buffer memory OOM when use Hbase bulk load for HDFS EC 
folder. Read some code, there is a potential risk in the ElasticByteBufferPool, 
as following code show, the tree check the key, and in HDFS client  
DFSStripedInputStream allocateDirect buffer pass the parameter is cellSize * 
dataBlkNum, here the question is if there are many different cellSize, it can 
introduce the direct buffer memory OOM.
{code:java}
// code placeholder
  public synchronized void putBuffer(ByteBuffer buffer) {
    buffer.clear();
    TreeMap<Key, ByteBuffer> tree = getBufferTree(buffer.isDirect());
    while (true) {
      Key key = new Key(buffer.capacity(), System.nanoTime());
      if (!tree.containsKey(key)) {
        tree.put(key, buffer);
        return;
      }
      // Buffers are indexed by (capacity, time).
      // If our key is not unique on the first try, we try again, since the
      // time will be different.  Since we use nanoseconds, it's pretty
      // unlikely that we'll loop even once, unless the system clock has a
      // poor granularity.
    }
  }
{code}
 

 

Wrote a simple test as following it can recreate the problem.

Please set the JVM arguments first, then run the test, it will hit the OOM.

-Xmx64m
-Xms64m
-Xmn32m
-XX:+UseConcMarkSweepGC
-XX:+PrintGCDetails
-XX:MaxDirectMemorySize=10M

 
{code:java}
// code placeholder
public class TestEBBP {


        private static final ByteBufferPool BUFFER_POOL = new 
ElasticByteBufferPool();
        
        @Test
        public void testOOM() {
                for (int i = 0; i < 100; i++) {
                        ByteBuffer buffer = BUFFER_POOL.getBuffer(true, 1024 * 
6 * i);
                        BUFFER_POOL.putBuffer(buffer);
                }
        System.out.println(((ElasticByteBufferPool)BUFFER_POOL).size(true));
        }
}
{code}
 

I am NOT pretty sure whether this is root cause for this issue, but wrote it 
out for FYI.

 

 

> DFSStripedInputStream curStripeBuf is not freed by unbuffer()
> -------------------------------------------------------------
>
>                 Key: HDFS-14308
>                 URL: https://issues.apache.org/jira/browse/HDFS-14308
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Joe McDonnell
>            Priority: Major
>         Attachments: ec_heap_dump.png
>
>
> Some users of HDFS cache opened HDFS file handles to avoid repeated 
> roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file 
> handles by default. Recent tests on erasure coded files show that the open 
> file handles can consume a large amount of memory when not in use.
> For example, here is output from Impala's JMX endpoint when 608 file handles 
> are cached
> {noformat}
> {
> "name": "java.nio:type=BufferPool,name=direct",
> "modelerType": "sun.management.ManagementFactoryHelper$1",
> "Name": "direct",
> "TotalCapacity": 1921048960,
> "MemoryUsed": 1921048961,
> "Count": 633,
> "ObjectName": "java.nio:type=BufferPool,name=direct"
> },{noformat}
> This shows direct buffer memory usage of 3MB per DFSStripedInputStream. 
> Attached is output from Eclipse MAT showing that the direct buffers come from 
> DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a 
> file handle is being cached and potentially unused for significant chunks of 
> time, yet this shows that the memory remains in use.
> To support caching file handles on erasure coded files, DFSStripedInputStream 
> should avoid holding buffers after the unbuffer() call. See HDFS-7694. 
> "unbuffer()" is intended to move an input stream to a lower memory state to 
> support these caching use cases. In particular, the curStripeBuf seems to be 
> allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is 
> not freed until close().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to