[ https://issues.apache.org/jira/browse/FLINK-18427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217428#comment-17217428 ]
Xintong Song commented on FLINK-18427: -------------------------------------- Thanks for puling me in, [~zjwang]. Sorry for the late response, [~simahao]. bq. 1. What's the difference between 'taskmanager.memory.task.off-heap.size' and 'taskmanager.memory.framework.off-heap.size'? I found that either 'task' or 'framework' setting is ok for my job, which one should I use? For the current version, there's practically no differences between task/framework memory. They are prepared for future optimization, where we plan to allow dynamically slicing task managers' memory to slots (currently it is sliced into fixed number and size of slots). Then, task memory can be sliced for task execution, while framework memory will be reserved for task manager's framework. Usually we recommend users to only set the task memory, because framework memory is quite stable and Flink already come up with good default values for them. However, in your case, the memory consumption comes from Netty which is definitely part of Flink's framework. That's why Stephan and Zhijiang suggested you to increase the framework memory. Again, there's practically no differences between task/framework memory for the current version. However, tuning the right configuration would help reduce efforts when upgrading to future versions。 bq. 2. According to webUI->Task Manager->Metrics,there are some memory metircs information, I want to know, Outside JVM(Type=Direct) is task's off-heap?If so, how to understand Capacity field?If not, where could I monitor the off-heap usage rate? Please ignore these metrics here. These's metrics are directly retrieved from MXBeans, and does not corresponds to Flink's memory configurations well. The community is already aware of how confusing and misleading these metrics could be, and is working on an improvement for this web ui page. Sorry for the inconvenience. > Job failed under java 11 > ------------------------ > > Key: FLINK-18427 > URL: https://issues.apache.org/jira/browse/FLINK-18427 > Project: Flink > Issue Type: Bug > Components: Runtime / Configuration, Runtime / Network > Affects Versions: 1.10.0 > Reporter: Zhang Hao > Priority: Critical > Attachments: image-2020-06-29-13-49-17-756.png > > > flink version:1.10.0 > deployment mode:cluster > os:linux redhat7.5 > Job parallelism:greater than 1 > My job run normally under java 8, but failed under java 11.Excpetion info > like below,netty send message failed.In addition, I found job would failed > when task was distributed on multi node, if I set job's parallelism = 1, job > run normally under java 11 too. >  > 2020-06-24 09:52:162020-06-24 > 09:52:16org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: > Sending the partition request to '/170.0.50.19:33320' failed. at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:124) > at > org.apache.flink.runtime.io.network.netty.NettyPartitionRequestClient$1.operationComplete(NettyPartitionRequestClient.java:115) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:500) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:474) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:413) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:538) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:531) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:111) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.notifyOutboundHandlerException(AbstractChannelHandlerContext.java:818) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:718) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:708) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.access$1700(AbstractChannelHandlerContext.java:56) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1102) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1149) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1073) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) > at > org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:515) > at > org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) > at > org.apache.flink.shaded.netty4.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at java.base/java.lang.Thread.run(Thread.java:834)Caused by: > java.io.IOException: Error while serializing message: > PartitionRequest(8059a0b47f7ba0ff814ea52427c584e7@6750c1170c861176ad3ceefe9b02f36e:0:2) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:177) > at > org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:716) > ... 11 moreCaused by: java.io.IOException: java.lang.OutOfMemoryError: > Direct buffer memory at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:497) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$NettyMessageEncoder.write(NettyMessage.java:174) > ... 12 moreCaused by: java.lang.OutOfMemoryError: Direct buffer memory at > java.base/java.nio.Bits.reserveMemory(Bits.java:175) at > java.base/java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:118) at > java.base/java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:317) at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:772) > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:748) > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:245) > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:215) > at > org.apache.flink.shaded.netty4.io.netty.buffer.PoolArena.allocate(PoolArena.java:147) > at > org.apache.flink.shaded.netty4.io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:342) > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) > at > org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) > at > org.apache.flink.runtime.io.network.netty.NettyMessage.allocateBuffer(NettyMessage.java:148) > at > org.apache.flink.runtime.io.network.netty.NettyMessage.allocateBuffer(NettyMessage.java:111) > at > org.apache.flink.runtime.io.network.netty.NettyMessage.access$200(NettyMessage.java:59) > at > org.apache.flink.runtime.io.network.netty.NettyMessage$PartitionRequest.write(NettyMessage.java:482) > ... 13 more -- This message was sent by Atlassian Jira (v8.3.4#803005)