[ https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821648#comment-13821648 ]
Colin Patrick McCabe commented on HDFS-5461: -------------------------------------------- We need to have a try/catch around both the creation of {{BlockReaderLocal}} and {{BlockReaderFactory.getLegacyBlockReaderLocal}} that catches {{OutOfMemoryException}}, since if one fails, the other definitely will do. One thing that I don't like about this is that we're not actually testing the failure case in the unit test. How about adding a configurable upper limit for the size of the {{DirectBufferPool}}? Then a unit test could set this and trigger the failure case. You're already tracking the used bytes, so it should be simple to implement via {{compareAndSet}}. This will also be good for users who don't want to use all their native memory for these buffers. {code} + /** + * Return the currently using memory sum size in MB. + */ + public long getUsingMemoryMB() { + return usingMemoryBytes.get()/(1024 * 1024); + } {code} Let's just return this in bytes. Using megabytes just opens up a can of worms (some people think it's base-10, others think it's base-2, etc). And obviously it's less precise. > fallback to non-ssr(local short circuit reads) while oom detected > ----------------------------------------------------------------- > > Key: HDFS-5461 > URL: https://issues.apache.org/jira/browse/HDFS-5461 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 3.0.0, 2.2.0 > Reporter: Liang Xie > Assignee: Liang Xie > Attachments: HDFS-5461.txt > > > Currently, the DirectBufferPool used by ssr feature seems doesn't have a > upper-bound limit except DirectMemory VM option. So there's a risk to > encounter direct memory oom. see HBASE-8143 for example. > IMHO, maybe we could improve it a bit: > 1) detect OOM or reach a setting up-limit from caller, then fallback to > non-ssr > 2) add a new metric about current raw consumed direct memory size. -- This message was sent by Atlassian JIRA (v6.1#6144)