[ https://issues.apache.org/jira/browse/HDFS-14308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932008#comment-16932008 ]
Zhao Yi Ming commented on HDFS-14308: ------------------------------------- We hit the direct buffer memory OOM when use Hbase bulk load for HDFS EC folder. Read some code, there is a potential risk in the ElasticByteBufferPool, as following code show, the tree check the key, and in HDFS client DFSStripedInputStream allocateDirect buffer pass the parameter is cellSize * dataBlkNum, here the question is if there are many different cellSize, it can introduce the direct buffer memory OOM. {code:java} // code placeholder public synchronized void putBuffer(ByteBuffer buffer) { buffer.clear(); TreeMap<Key, ByteBuffer> tree = getBufferTree(buffer.isDirect()); while (true) { Key key = new Key(buffer.capacity(), System.nanoTime()); if (!tree.containsKey(key)) { tree.put(key, buffer); return; } // Buffers are indexed by (capacity, time). // If our key is not unique on the first try, we try again, since the // time will be different. Since we use nanoseconds, it's pretty // unlikely that we'll loop even once, unless the system clock has a // poor granularity. } } {code} Wrote a simple test as following it can recreate the problem. Please set the JVM arguments first, then run the test, it will hit the OOM. -Xmx64m -Xms64m -Xmn32m -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:MaxDirectMemorySize=10M {code:java} // code placeholder public class TestEBBP { private static final ByteBufferPool BUFFER_POOL = new ElasticByteBufferPool(); @Test public void testOOM() { for (int i = 0; i < 100; i++) { ByteBuffer buffer = BUFFER_POOL.getBuffer(true, 1024 * 6 * i); BUFFER_POOL.putBuffer(buffer); } System.out.println(((ElasticByteBufferPool)BUFFER_POOL).size(true)); } } {code} I am NOT pretty sure whether this is root cause for this issue, but wrote it out for FYI. > DFSStripedInputStream curStripeBuf is not freed by unbuffer() > ------------------------------------------------------------- > > Key: HDFS-14308 > URL: https://issues.apache.org/jira/browse/HDFS-14308 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: Joe McDonnell > Priority: Major > Attachments: ec_heap_dump.png > > > Some users of HDFS cache opened HDFS file handles to avoid repeated > roundtrips to the NameNode. For example, Impala caches up to 20,000 HDFS file > handles by default. Recent tests on erasure coded files show that the open > file handles can consume a large amount of memory when not in use. > For example, here is output from Impala's JMX endpoint when 608 file handles > are cached > {noformat} > { > "name": "java.nio:type=BufferPool,name=direct", > "modelerType": "sun.management.ManagementFactoryHelper$1", > "Name": "direct", > "TotalCapacity": 1921048960, > "MemoryUsed": 1921048961, > "Count": 633, > "ObjectName": "java.nio:type=BufferPool,name=direct" > },{noformat} > This shows direct buffer memory usage of 3MB per DFSStripedInputStream. > Attached is output from Eclipse MAT showing that the direct buffers come from > DFSStripedInputStream objects. Both Impala and HBase call unbuffer() when a > file handle is being cached and potentially unused for significant chunks of > time, yet this shows that the memory remains in use. > To support caching file handles on erasure coded files, DFSStripedInputStream > should avoid holding buffers after the unbuffer() call. See HDFS-7694. > "unbuffer()" is intended to move an input stream to a lower memory state to > support these caching use cases. In particular, the curStripeBuf seems to be > allocated from the BUFFER_POOL on a resetCurStripeBuffer(true) call. It is > not freed until close(). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org