[ https://issues.apache.org/jira/browse/HBASE-27947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17756361#comment-17756361 ]
Hudson commented on HBASE-27947: -------------------------------- Results for branch branch-3 [build #33 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/33/]: (/) *{color:green}+1 overall{color}* ---- details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/33/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/33/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-3/33/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > RegionServer OOM under load when TLS is enabled > ----------------------------------------------- > > Key: HBASE-27947 > URL: https://issues.apache.org/jira/browse/HBASE-27947 > Project: HBase > Issue Type: Bug > Components: rpc > Affects Versions: 2.6.0 > Reporter: Bryan Beaudreault > Assignee: Bryan Beaudreault > Priority: Critical > Fix For: 2.6.0, 3.0.0-beta-1 > > Attachments: ssl-disabled-flamegraph.html, > ssl-enabled-flamegraph.html, ssl-enabled-optimized.html > > > We are rolling out the server side TLS settings to all of our QA clusters. > This has mostly gone fine, except on 1 cluster. Most clusters, including this > one have a sampled {{nettyDirectMemory}} usage of about 30-100mb. This > cluster tends to get bursts of traffic, in which case it would typically jump > to 400-500mb. Again this is sampled, so it could have been higher than that. > When we enabled SSL on this cluster, we started seeing bursts up to at least > 4gb. This exceeded our {{{}-XX:MaxDirectMemorySize{}}}, which caused OOM's > and general chaos on the cluster. > > We've gotten it under control a little bit by setting > {{-Dorg.apache.hbase.thirdparty.io.netty.maxDirectMemory}} and > {{{}-Dorg.apache.hbase.thirdparty.io.netty.tryReflectionSetAccessible{}}}. > We've set netty's maxDirectMemory to be approx equal to > ({{{}-XX:MaxDirectMemorySize - BucketCacheSize - ReservoirSize{}}}). Now we > are seeing netty's own OutOfDirectMemoryError, which is still causing pain > for clients but at least insulates the other components of the regionserver. > > We're still digging into exactly why this is happening. The cluster clearly > has a bad access pattern, but it doesn't seem like SSL should increase the > memory footprint by 5-10x like we're seeing. -- This message was sent by Atlassian Jira (v8.20.10#820010)