[
https://issues.apache.org/jira/browse/FLINK-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haibo Suen updated FLINK-11379:
-------------------------------
Description:
When TM loads a offloaded TDD with large size, it may throw a
"java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
_Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
is equal the length of the file. This will cause OutOfMemoryErro error when
direct memory is not enough.
If the length of a file is large than a maximum buffer size, the maximum size
direct-buffer should be used to read bytes of the file to avoid direct memory
OutOfMemoryError. The maximum buffer size can be 8K or others.
The exception stack is as follows (this exception stack is from an old Flink
version, but the master branch has the same problem).
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:706)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.nio.file.Files.read(Files.java:3105)
at java.nio.file.Files.readAllBytes(Files.java:3158)
at
org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
... 9 more
was:
When TM loads a offloaded TDD with large size, it may throw a
"java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
_Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
is equal the length of the file. This will cause OutOfMemoryErro error when
direct memory is not enough.
A fixed size direct buffer should be used to read all bytes of a file to avoid
direct memory OutOfMemoryError, such as a 8K buffer.
The exception stack is as follows (this exception stack is from an old Flink
version, but the master branch has the same problem).
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:706)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.nio.file.Files.read(Files.java:3105)
at java.nio.file.Files.readAllBytes(Files.java:3158)
at
org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
... 9 more
> "java.lang.OutOfMemoryError: Direct buffer memory" when TM loads a large size
> TDD
> ---------------------------------------------------------------------------------
>
> Key: FLINK-11379
> URL: https://issues.apache.org/jira/browse/FLINK-11379
> Project: Flink
> Issue Type: Bug
> Components: TaskManager
> Affects Versions: 1.7.0, 1.7.1
> Reporter: Haibo Suen
> Assignee: Haibo Suen
> Priority: Major
>
> When TM loads a offloaded TDD with large size, it may throw a
> "java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
> nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
> _Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
> is equal the length of the file. This will cause OutOfMemoryErro error when
> direct memory is not enough.
> If the length of a file is large than a maximum buffer size, the maximum
> size direct-buffer should be used to read bytes of the file to avoid direct
> memory OutOfMemoryError. The maximum buffer size can be 8K or others.
> The exception stack is as follows (this exception stack is from an old Flink
> version, but the master branch has the same problem).
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:706)
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> at java.nio.file.Files.read(Files.java:3105)
> at java.nio.file.Files.readAllBytes(Files.java:3158)
> at
> org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> ... 9 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)