[
https://issues.apache.org/jira/browse/FLINK-11379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Haibo Suen updated FLINK-11379:
-------------------------------
Description:
When TM loads a offloaded TDD with large size, it may throw a
"java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
_Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
is equal the length of the file. This will cause OutOfMemoryErro error when
direct memory is not enough.
A fixed size direct buffer should be used to read all bytes of a file to avoid
direct memory OutOfMemoryError, such as a 8K buffer.
The exception stack is as follows (this exception stack is from an old Flink
version, but the master branch has the same problem).
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:706)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.nio.file.Files.read(Files.java:3105)
at java.nio.file.Files.readAllBytes(Files.java:3158)
at
org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
... 9 more
was:
When TM loads a offloaded TDD with large size, it may throw a
"java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
_Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
is equal the length of the file. This will cause OutOfMemoryErro error when
direct memory is not enough.
A fixed size direct buffer should be used to read a file to avoid
OutOfMemoryErro error, such as a 8K buffer.
The exception stack is as follows (this exception stack is from an old Flink
version, but the master branch has the same problem).
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:706)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
at sun.nio.ch.IOUtil.read(IOUtil.java:195)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
at java.nio.file.Files.read(Files.java:3105)
at java.nio.file.Files.readAllBytes(Files.java:3158)
at
org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
at
org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
... 9 more
> "java.lang.OutOfMemoryError: Direct buffer memory" when TM loads a large size
> TDD
> ---------------------------------------------------------------------------------
>
> Key: FLINK-11379
> URL: https://issues.apache.org/jira/browse/FLINK-11379
> Project: Flink
> Issue Type: Bug
> Components: TaskManager
> Affects Versions: 1.7.0, 1.7.1
> Reporter: Haibo Suen
> Assignee: Haibo Suen
> Priority: Major
>
> When TM loads a offloaded TDD with large size, it may throw a
> "java.lang.OutOfMemoryError: Direct Buffer Memory" error. The loading uses
> nio's _Files.readAllBytes()_ to read serialized TDD. In the call stack of
> _Files.readAllBytes()_ , it will allocate a direct memory buffer which's size
> is equal the length of the file. This will cause OutOfMemoryErro error when
> direct memory is not enough.
> A fixed size direct buffer should be used to read all bytes of a file to
> avoid direct memory OutOfMemoryError, such as a 8K buffer.
> The exception stack is as follows (this exception stack is from an old Flink
> version, but the master branch has the same problem).
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> at java.nio.Bits.reserveMemory(Bits.java:706)
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:241)
> at sun.nio.ch.IOUtil.read(IOUtil.java:195)
> at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:182)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> at java.nio.file.Files.read(Files.java:3105)
> at java.nio.file.Files.readAllBytes(Files.java:3158)
> at
> org.apache.flink.runtime.deployment.TaskDeploymentDescriptor.loadBigData(TaskDeploymentDescriptor.java:338)
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.submitTask(TaskExecutor.java:397)
> at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:211)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:155)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$onReceive$1(AkkaRpcActor.java:133)
> at akka.actor.ActorCell$$anonfun$become$1.applyOrElse(ActorCell.scala:544)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
> ... 9 more
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)