[ 
https://issues.apache.org/jira/browse/HDFS-5461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-5461:
----------------------------

    Attachment: HDFS-5461.txt

bq.  It's because each open stream holds a buffer, and we have hundreds of open 
streams?
i am not 100% sure, but in my mind, i agree with you,  this oom is easy to 
repro while we have lots of opened storefiles to be read(e.g. compaction can't 
catch up sometimes)

Oh, i see, seems the fallback only meaningful for some config like mine:  big 
Xmx and small MaxDirectMemorySize :)

I attached a patch with more logging about using/pooled direct buffer size. In 
my option, it could be useful probably while online resetting the log level to 
"trace"  during OOM occur.  And add a simple try/catch fallback handle for OOM 
without introducing any config value, per me, seems this way is more 
reasonable:)

> fallback to non-ssr(local short circuit reads) while oom detected
> -----------------------------------------------------------------
>
>                 Key: HDFS-5461
>                 URL: https://issues.apache.org/jira/browse/HDFS-5461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Liang Xie
>         Attachments: HDFS-5461.txt
>
>
> Currently, the DirectBufferPool used by ssr feature seems doesn't have a 
> upper-bound limit except DirectMemory VM option. So there's a risk to 
> encounter direct memory oom. see HBASE-8143 for example.
> IMHO, maybe we could improve it a bit:
> 1) detect OOM or reach a setting up-limit from caller, then fallback to 
> non-ssr
> 2) add a new metric about current raw consumed direct memory size.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to