[jira] [Commented] (HBASE-27112) Investigate Netty resource usage limits

Andrew Kyle Purtell (Jira) Tue, 12 Jul 2022 11:47:08 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565965#comment-17565965
 ]


Andrew Kyle Purtell commented on HBASE-27112:
---------------------------------------------

Thank you for writing in to follow up [~stack]. 

[~vjasani] I filed HBASE-27195 to follow up on your above comment.

> Investigate Netty resource usage limits
> ---------------------------------------
>
>                 Key: HBASE-27112
>                 URL: https://issues.apache.org/jira/browse/HBASE-27112
>             Project: HBase
>          Issue Type: Sub-task
>          Components: IPC/RPC
>    Affects Versions: 2.5.0
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: Image 7-11-22 at 10.12 PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27112) Investigate Netty resource usage limits

Reply via email to