[ 
https://issues.apache.org/jira/browse/HBASE-27112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17560816#comment-17560816
 ] 

Michael Stack commented on HBASE-27112:
---------------------------------------

{quote}Instrument netty level resources to understand better actual resource 
allocations under load. Investigate what we need to plug in where to gain 
visibility.
{quote}
There doesn't seem to be an amenable, native metrics export by netty and core 
classes/interfaces are bare of anything but functionality so tough getting 
counts w/o gymnastics (It seems like it is an old ask of netty 
[https://github.com/netty/netty/issues/6523]. or 
https://groups.google.com/g/netty/c/-ZNx-L75csc/m/z8u5rp6lCUQJ).

Thread-dumping while under load w/ default config., I see 2*CPU_COUNT netty 
RS-EventLoopGroup-N threads all RUNNABLE seemingly doing nothing stuck on 
epoll.Native.epollWait.

I am running w/ 2*CPU_COUNT handlers.

Netty thread count does seems excessive.

I tried running w/ 8 threads in the RS-EventLoopGroup pool on a 24 "CPU" node 
w/ load on and throughput seemed less, but pretty close; 75th percentile seemed 
the same before as after but 99.9th percentile was elevated some (~70ms vs 
~90ms). With less threads I started to get a few CallQueueTooBigExceptions on a 
few servers which I didn't notice happening when 2*CPU_COUNT.

I tried with CPU_COUNT netty RS-EventLoopGroup threads. All seems to be about 
same as 2*CPU_COUNT – perhaps slightly less throughput though the 99.9th 
percentile seemed less (~70ms vs ~50). There were a few transient 
CallQueueTooBigExceptions. I think I'm going to leave the default 2*CPU_COUNT 
in place for now on this cluster though it seems profligate (80% read/20% 
write) until someone does a deeper dig than the cursory one done here. Thanks.

Hope this helps.

> Investigate Netty resource usage limits
> ---------------------------------------
>
>                 Key: HBASE-27112
>                 URL: https://issues.apache.org/jira/browse/HBASE-27112
>             Project: HBase
>          Issue Type: Sub-task
>          Components: IPC/RPC
>    Affects Versions: 2.5.0
>            Reporter: Andrew Kyle Purtell
>            Assignee: Andrew Kyle Purtell
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-4
>
>
> We leave Netty level resource limits unbounded. The number of threads to use 
> for the event loop is default 0 (unbounded). The default for 
> io.netty.eventLoop.maxPendingTasks is INT_MAX. 
> We don't do that for our own RPC handlers. We have a notion of maximum 
> handler pool size, with a default of 30, typically raised in production by 
> the user. We constrain the depth of the request queue in multiple ways... 
> limits on number of queued calls, limits on total size of calls data that can 
> be queued (to avoid memory usage overrun, CoDel conditioning of the call 
> queues if it is enabled, and so on.
> Under load can we pile up a excess of pending request state, such as direct 
> buffers containing request bytes, at the netty layer because of downstream 
> resource limits? Those limits will act as a bottleneck, as intended, and 
> before would have also applied backpressure through RPC too, because 
> SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", 
> default 10), but Netty may be able to queue up a lot more, in comparison, 
> because Netty has been optimized to prefer concurrency.
> Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
> (unbounded). I don't know what it can actually get up to in production, 
> because we lack the metric, but there are diminishing returns when threads > 
> cores so a reasonable default here could be 
> Runtime.getRuntime().availableProcessors() instead of unbounded?
> maxPendingTasks probably should not be INT_MAX, but that may matter less.
> The tasks here are:
> - Instrument netty level resources to understand better actual resource 
> allocations under load. Investigate what we need to plug in where to gain 
> visibility. 
> - Where instrumentation designed for this issue can be implemented as low 
> overhead metrics, consider formally adding them as a metric. 
> - Based on the findings from this instrumentation, consider and implement 
> next steps. The goal would be to limit concurrency at the Netty layer in such 
> a way that performance is still good, and under load we don't balloon 
> resource usage at the Netty layer.
> If the instrumentation and experimental results indicate no changes are 
> necessary, we can close this as Not A Problem or WontFix. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to