[ https://issues.apache.org/jira/browse/HBASE-15756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
stack updated HBASE-15756: -------------------------- Attachment: gc.png gets.png Here are a few graphs [~aoxiang] As is, we are slower ... 120k vs ~105k with the patch as is. Looking at thread dump, the nioEventLoopGroup for workers is spinning up lots of threads.... 40 or 50? Trying to do apples to apples, again I put a bound on threads created making netty worker count == readers count. When I do this, I get closer... 120k vs 115k or so? I then set hbase.rpc.server.nativetransport to true so we use the alternative epoll and then I get almost the same: 120k vs ~199k. Looking at the thread stack, I see this: 3683 "epollEventLoopGroup-3-5" #40 prio=10 os_prio=0 tid=0x000000000260f520 nid=0xe09b runnable [0x00007f5ac437e000] 3684 java.lang.Thread.State: RUNNABLE 3685 at sun.misc.Cleaner.add(Cleaner.java:79) 3686 - locked <0x00007f5bde303070> (a java.lang.Class for sun.misc.Cleaner) 3687 at sun.misc.Cleaner.create(Cleaner.java:133) 3688 at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:139) 3689 at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 3690 at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) 3691 at io.netty.buffer.UnpooledUnsafeDirectByteBuf.<init>(UnpooledUnsafeDirectByteBuf.java:69) 3692 at io.netty.buffer.UnpooledByteBufAllocator.newDirectBuffer(UnpooledByteBufAllocator.java:50) 3693 at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155) 3694 at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146) 3695 at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107) 3696 at io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104) 3697 at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:712) 3698 at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) 3699 at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) 3700 at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) 3701 at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) 3702 at java.lang.Thread.run(Thread.java:745) We seem to be doing direct allocations each time. That'll slow us down (also explains a slightly higher GC time). I and (@appy) messed around trying to use a buffer pool enabling this... bootstrap.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT); ... and messing in code but our server hangs. I can mess more but thought I'd ask you first since you are probably coming on line now why you had the above commented out. Hopefully we can put our own allocator in here... one that does [~anoop.hbase]'s fixed size pool of buffers... hmmmm... or this might take some work... We'll see. Anyways, any thoughts on above [~aoxiang] appreciated. If we can make netty as fast -- or faster -- and we can make it so it plays nicely with the offheaping of the write path, lets slot it in. Thanks. > Pluggable RpcServer > ------------------- > > Key: HBASE-15756 > URL: https://issues.apache.org/jira/browse/HBASE-15756 > Project: HBase > Issue Type: Improvement > Components: Performance, rpc > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Attachments: Netty4RpcServer_forperf.patch, NettyRpcServer.patch, > NettyRpcServer_forperf.patch, gc.png, gets.png, gets.png, idle.png, queue.png > > > Current we use a simple RpcServer, and can not configure and use other > implementation.This issue is to make the RpcServer pluggable, so we can make > other implementation for example netty rpc server. Patch will upload laterly -- This message was sent by Atlassian JIRA (v6.3.4#6332)