[ 
https://issues.apache.org/jira/browse/FLINK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232482#comment-17232482
 ] 

Xintong Song commented on FLINK-20155:
--------------------------------------

Hi [~roeehersh],

The problem you described sounds like a direct memory leak to me. A previous 
finished job may failed to properly release its direct memory. 

While the exception is thrown from the pulsar client, the leak may also come 
from other places.

I would suggest looking into a dump profile, looking for the unreleased 
resources from previous finished jobs.
{quote}i also notice that even thought i configure 10gb memory, my flink 
managed memory is much smaller:
{quote}
That's true. Currently the metrics on this page have not cover all Flink's 
memory use cases. The most important part missing is the native memory. There 
will be an improvement to this page in the upcoming release 1.12.

> java.lang.OutOfMemoryError: Direct buffer memory
> ------------------------------------------------
>
>                 Key: FLINK-20155
>                 URL: https://issues.apache.org/jira/browse/FLINK-20155
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission
>    Affects Versions: 1.11.1
>            Reporter: roee hershko
>            Priority: Major
>             Fix For: 1.11.1
>
>         Attachments: image-2020-11-13-17-52-54-217.png
>
>
> update:
> this issue occur every time after a job fails the only way to fix it is to 
> manually re-create the task managers pods (i am using flink operator)
>  
> after submitting a job, it runs for few hours and then the job manager is 
> crushing, when trying to re-create the job i am getting the following error:
>  
> {code:java}
> 2020-11-13 17:44:58org.apache.pulsar.client.admin.PulsarAdminException: 
> org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memory    at 
> org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:228)
>     at 
> org.apache.pulsar.client.admin.internal.TopicsImpl$7.failed(TopicsImpl.java:324)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.JerseyInvocation$4.failed(JerseyInvocation.java:1030)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.processFailure(ClientRuntime.java:231)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime.access$100(ClientRuntime.java:85)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.lambda$failure$1(ClientRuntime.java:183)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:272)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors$1.call(Errors.java:268)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:316)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:298)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.internal.Errors.process(Errors.java:268)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:312)
>     at 
> org.apache.pulsar.shade.org.glassfish.jersey.client.ClientRuntime$2.failure(ClientRuntime.java:183)
>     at 
> org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$3.onThrowable(AsyncHttpConnector.java:279)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:277)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteListener.abortOnThrowable(WriteListener.java:50)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteListener.operationComplete(WriteListener.java:61)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteCompleteListener.operationComplete(WriteCompleteListener.java:28)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.WriteCompleteListener.operationComplete(WriteCompleteListener.java:20)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.addListener(DefaultPromise.java:183)
>     at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:95)
>     at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPromise.addListener(DefaultChannelPromise.java:30)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:421)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.channel.NettyConnectListener.writeRequest(NettyConnectListener.java:80)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.channel.NettyConnectListener.onSuccess(NettyConnectListener.java:156)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onSuccess(NettyChannelConnector.java:92)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:26)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:570)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:549)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
>     at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:335)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:702)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
>     at 
> org.apache.pulsar.shade.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>     at 
> org.apache.pulsar.shade.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>     at 
> org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>     at java.base/java.lang.Thread.run(Unknown Source)Caused by: 
> org.apache.pulsar.shade.io.netty.handler.codec.EncoderException: 
> java.lang.OutOfMemoryError: Direct buffer memory    at 
> org.apache.pulsar.shade.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:107)
>     at 
> org.apache.pulsar.shade.io.netty.channel.CombinedChannelDuplexHandler.write(CombinedChannelDuplexHandler.java:346)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702)
>     at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.doFlush(ChunkedWriteHandler.java:300)
>     at 
> org.apache.pulsar.shade.io.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:132)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758)
>     at 
> org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020)
>     at 
> org.apache.pulsar.shade.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:299)
>     at 
> org.apache.pulsar.shade.org.asynchttpclient.netty.request.NettyRequestSender.writeRequest(NettyRequestSender.java:420)
>     ... 23 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory    
> at java.base/java.nio.Bits.reserveMemory(Unknown Source)    at 
> java.base/java.nio.DirectByteBuffer.<init>(Unknown Source)    at 
> java.base/java.nio.ByteBuffer.allocateDirect(Unknown Source)    at 
> org.apache.pulsar.shade.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:758)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:734)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:245)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
>     at 
> org.apache.pulsar.shade.io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:115)
>     at 
> org.apache.pulsar.shade.io.netty.handler.codec.http.HttpObjectEncoder.encode(HttpObjectEncoder.java:93)
>     at 
> org.apache.pulsar.shade.io.netty.handler.codec.http.HttpClientCodec$Encoder.encode(HttpClientCodec.java:167)
>     at 
> org.apache.pulsar.shade.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:89)
>     ... 37 more
> {code}
> the only  way to fix it is to restart all the task managers.
> i also notice that even thought i configure 10gb memory, my flink managed 
> memory is much smaller:
>  !image-2020-11-13-17-52-54-217.png!
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to