[ https://issues.apache.org/jira/browse/HBASE-27947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749230#comment-17749230 ]
Duo Zhang commented on HBASE-27947: ----------------------------------- {quote} Yes, that's true. I can try to wrap up the simpler solution for now. {quote} Good. Thanks. > RegionServer OOM under load when TLS is enabled > ----------------------------------------------- > > Key: HBASE-27947 > URL: https://issues.apache.org/jira/browse/HBASE-27947 > Project: HBase > Issue Type: Bug > Components: rpc > Affects Versions: 2.6.0 > Reporter: Bryan Beaudreault > Priority: Critical > > We are rolling out the server side TLS settings to all of our QA clusters. > This has mostly gone fine, except on 1 cluster. Most clusters, including this > one have a sampled {{nettyDirectMemory}} usage of about 30-100mb. This > cluster tends to get bursts of traffic, in which case it would typically jump > to 400-500mb. Again this is sampled, so it could have been higher than that. > When we enabled SSL on this cluster, we started seeing bursts up to at least > 4gb. This exceeded our {{{}-XX:MaxDirectMemorySize{}}}, which caused OOM's > and general chaos on the cluster. > > We've gotten it under control a little bit by setting > {{-Dorg.apache.hbase.thirdparty.io.netty.maxDirectMemory}} and > {{{}-Dorg.apache.hbase.thirdparty.io.netty.tryReflectionSetAccessible{}}}. > We've set netty's maxDirectMemory to be approx equal to > ({{{}-XX:MaxDirectMemorySize - BucketCacheSize - ReservoirSize{}}}). Now we > are seeing netty's own OutOfDirectMemoryError, which is still causing pain > for clients but at least insulates the other components of the regionserver. > > We're still digging into exactly why this is happening. The cluster clearly > has a bad access pattern, but it doesn't seem like SSL should increase the > memory footprint by 5-10x like we're seeing. -- This message was sent by Atlassian Jira (v8.20.10#820010)