[ https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565965#comment-17565965 ]
Andrew Kyle Purtell commented on HBASE-27112: --------------------------------------------- Thank you for writing in to follow up [~stack]. [~vjasani] I filed HBASE-27195 to follow up on your above comment. > Investigate Netty resource usage limits > --------------------------------------- > > Key: HBASE-27112 > URL: https://issues.apache.org/jira/browse/HBASE-27112 > Project: HBase > Issue Type: Sub-task > Components: IPC/RPC > Affects Versions: 2.5.0 > Reporter: Andrew Kyle Purtell > Priority: Major > Attachments: Image 7-11-22 at 10.12 PM.jpg > > > We leave Netty level resource limits unbounded. The number of threads to use > for the event loop is default 0 (unbounded). The default for > io.netty.eventLoop.maxPendingTasks is INT_MAX. > We don't do that for our own RPC handlers. We have a notion of maximum > handler pool size, with a default of 30, typically raised in production by > the user. We constrain the depth of the request queue in multiple ways... > limits on number of queued calls, limits on total size of calls data that can > be queued (to avoid memory usage overrun, CoDel conditioning of the call > queues if it is enabled, and so on. > Under load can we pile up a excess of pending request state, such as direct > buffers containing request bytes, at the netty layer because of downstream > resource limits? Those limits will act as a bottleneck, as intended, and > before would have also applied backpressure through RPC too, because > SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", > default 10), but Netty may be able to queue up a lot more, in comparison, > because Netty has been optimized to prefer concurrency. > Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 > (unbounded). I don't know what it can actually get up to in production, > because we lack the metric, but there are diminishing returns when threads > > cores so a reasonable default here could be > Runtime.getRuntime().availableProcessors() instead of unbounded? > maxPendingTasks probably should not be INT_MAX, but that may matter less. > The tasks here are: > - Instrument netty level resources to understand better actual resource > allocations under load. Investigate what we need to plug in where to gain > visibility. > - Where instrumentation designed for this issue can be implemented as low > overhead metrics, consider formally adding them as a metric. > - Based on the findings from this instrumentation, consider and implement > next steps. The goal would be to limit concurrency at the Netty layer in such > a way that performance is still good, and under load we don't balloon > resource usage at the Netty layer. > If the instrumentation and experimental results indicate no changes are > necessary, we can close this as Not A Problem or WontFix. -- This message was sent by Atlassian Jira (v8.20.10#820010)