[ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821648#comment-13821648
 ] 

Colin Patrick McCabe commented on HDFS-5461:
--------------------------------------------

We need to have a try/catch around both the creation of {{BlockReaderLocal}} 
and {{BlockReaderFactory.getLegacyBlockReaderLocal}} that catches 
{{OutOfMemoryException}}, since if one fails, the other definitely will do.

One thing that I don't like about this is that we're not actually testing the 
failure case in the unit test.  How about adding a configurable upper limit for 
the size of the {{DirectBufferPool}}?  Then a unit test could set this and 
trigger the failure case.  You're already tracking the used bytes, so it should 
be simple to implement via {{compareAndSet}}.  This will also be good for users 
who don't want to use all their native memory for these buffers.

{code}
+  /**
+   * Return the currently using memory sum size in MB.
+   */
+  public long getUsingMemoryMB() {
+    return usingMemoryBytes.get()/(1024 * 1024);
+  }
{code}

Let's just return this in bytes.  Using megabytes just opens up a can of worms 
(some people think it's base-10, others think it's base-2, etc).  And obviously 
it's less precise.

> fallback to non-ssr(local short circuit reads) while oom detected
> -----------------------------------------------------------------
>
>                 Key: HDFS-5461
>                 URL: https://issues.apache.org/jira/browse/HDFS-5461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>         Attachments: HDFS-5461.txt
>
>
> Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
> upper-bound limit except DirectMemory VM option. So there's a risk to 
> encounter direct memory oom. see HBASE-8143 for example.
> IMHO, maybe we could improve it a bit:
> 1) detect OOM or reach a setting up-limit from caller, then fallback to 
> non-ssr
> 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to