[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:41 PM:
----------------------------------------------------------------------

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.


This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently 
used as a fallback by Cloudera customers (and I presume others) with 2.2 when 
the Netty implementation has issues. I would also want it as a fallback option 
for our production. Anyway this is the kind of major operational change which 
should have a deprecation before removal. Deprecation can be done now, that 
seems ok. Removal can be done in 3.0. 


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.


This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently 
used as a fallback by Cloudera customers (and I presume others) with 2.2 when 
the Netty implementation has issues, and anyway this is the kind of major 
operational change which should have a deprecation before removal. Deprecation 
can be done now, that seems ok. Removal can be done in 3.0. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-26708
>                 URL: https://issues.apache.org/jira/browse/HBASE-26708
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>    Affects Versions: 2.5.0, 2.4.6
>            Reporter: Viraj Jasani
>            Assignee: Duo Zhang
>            Priority: Blocker
>             Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> And finally handlers are removed from the pipeline due to 
> OutOfDirectMemoryError:
> {code:java}
> 2022-01-25 17:36:28,657 WARN  [S-EventLoopGroup-1-5] 
> channel.DefaultChannelPipeline - An exceptionCaught() event was fired, and it 
> reached at the tail of the pipeline. It usually means the last handler in the 
> pipeline did not handle the exception.
> org.apache.hbase.thirdparty.io.netty.channel.ChannelPipelineException: 
> org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.handlerAdded()
>  has thrown an exception; removed.
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:181)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:358)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:339)
>   at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:229)
>   at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$600(NettyRpcConnection.java:79)
>   at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:312)
>   at 
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:300)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:653)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: 
> failed to allocate 16777216 byte(s) of direct memory (used: 33269220801, max: 
> 33285996544)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:802)
>   at 
> org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:731)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:632)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:607)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:202)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:172)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:134)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
>   at 
> org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
>   at 
> org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.writeResponse(NettyHBaseSaslRpcClientHandler.java:79)
>   at 
> org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.handlerAdded(NettyHBaseSaslRpcClientHandler.java:115)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:938)
>   at 
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
>   ... 24 more
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to