[ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566713#comment-17566713
 ] 

Duo Zhang commented on HBASE-27112:
-----------------------------------

I think [~norman] means one EventLoopGroup, not one EventLoop...
We have already tried to do this in our code. In the code pasted above, in 
NettyRpcServer, we will try to use the EventLoopGroup created in HRegionServer. 
And you can see NettyEventLoopGroupConfig.setup, where we will also set the 
same EventLoopGroup to NettyRpcClient and AsyncFSWAL. If there are still 
missing ones, we could try to see how to make them use the same EventLoopGroup.

The default value of 2 * cpu_number has been there for decades, at least when I 
wrote NIO code in 200x, the suggested thread number for doing Selector.select 
in java is 2 * cpu_number.

There are some modern frameworks like seastar, use the mechanism called 
TPC(thread per core), where the thread number is the same with cpu number, but 
it also requires run everything in these threads. Cassandra opened an issue 
about switching to TPC CASSANDRA-10989, but it is still open...

Thanks.

> Investigate Netty resource usage limits
> ---------------------------------------
>
>                 Key: HBASE-27112
>                 URL: https://issues.apache.org/jira/browse/HBASE-27112
>             Project: HBase
>          Issue Type: Sub-task
>          Components: IPC/RPC
>    Affects Versions: 2.5.0
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>         Attachments: Image 7-11-22 at 10.12 PM.jpg, Image 7-12-22 at 10.45 
> PM.jpg
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to