[ 
https://issues.apache.org/jira/browse/FLINK-36348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuannan Su reassigned FLINK-36348:
----------------------------------

    Assignee: Xuannan Su

> Netty shuffle direct memory consumption end-to-end test failed due to direct 
> memory OOM
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-36348
>                 URL: https://issues.apache.org/jira/browse/FLINK-36348
>             Project: Flink
>          Issue Type: Bug
>          Components: Tests
>    Affects Versions: 2.0-preview
>            Reporter: Weijie Guo
>            Assignee: Xuannan Su
>            Priority: Major
>
> Found the root cause from downloaded artifacts.
> {code:java}
> org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: 
> Direct buffer memory (connection to 'localhost/127.0.0.1:45889 
> [localhost:42633-cbcb9d]')
>       at 
> org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:175)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:265)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:231)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1398)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:258)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:238)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelActive(DefaultChannelPipeline.java:895)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.fulfillConnectPromise(AbstractEpollChannel.java:658)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:691)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at java.lang.Thread.run(Thread.java:829) ~[?:?]
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory. The direct 
> out-of-memory error has occurred. This can mean two things: either job(s) 
> require(s) a larger size of JVM direct memory or there is a direct memory 
> leak. The direct memory can be allocated by user code or some of its 
> dependencies. In this case 'taskmanager.memory.task.off-heap.size' 
> configuration option should be increased. Flink framework and its 
> dependencies also consume the direct memory, mostly for network 
> communication. The most of network memory is managed by Flink and should not 
> result in out-of-memory error. In certain special cases, in particular for 
> jobs with high parallelism, the framework may require more direct memory 
> which is not managed by Flink. In this case 
> 'taskmanager.memory.framework.off-heap.size' configuration option should be 
> increased. If the error persists then there is probably a direct memory leak 
> in user code or some of its dependencies which has to be investigated and 
> fixed. The task executor has to be shutdown...
>       at java.nio.Bits.reserveMemory(Bits.java:175) ~[?:?]
>       at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) ~[?:?]
>       at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) ~[?:?]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:717)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:692)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:215)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.tcacheAllocateSmall(PoolArena.java:180)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:137)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.runtime.io.network.netty.BufferResponseDecoder.onChannelActive(BufferResponseDecoder.java:54)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.runtime.io.network.netty.NettyMessageClientDecoderDelegate.channelActive(NettyMessageClientDecoderDelegate.java:74)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       at 
> org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:262)
>  ~[flink-dist-2.0-SNAPSHOT.jar:2.0-SNAPSHOT]
>       ... 14 more
> {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=62343&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=10d6732b-d79a-5c68-62a5-668516de5313&l=13005



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to