[
https://issues.apache.org/jira/browse/FLINK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238754#comment-17238754
]
Flavio Pompermaier commented on FLINK-20333:
--------------------------------------------
Could this problem affect also normal java jobs? I have the same leak in my
Flink session cluster...actually this is the suggested leak message that the
Eclipse MAT gives to me:
{code:java}
5,416 instances of "java.lang.Class", loaded by "<system class loader>" occupy
2,706,048 (11.04%)bytes.
Biggest instances:
class java.io.ObjectStreamClass$Caches @ 0xe0f52f98 - 402,896 (1.64%) bytes.
{code}
After some job resubmission I get the follogin exception:
{code:java}
java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error has
occurred. This can mean two things: either the job requires a larger size of
JVM metaspace to load classes or there is a class loading leak. In the first
case 'taskmanager.memory.jvm-metaspace.size' configuration option should be
increased. If the error persists (usually in cluster after several job
(re-)submissions) then there is probably a class loading leak in user code or
some of its dependencies which has to be investigated and fixed. The task
executor has to be shutdown...
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?]
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:550) ~[?:?]
at java.net.URLClassLoader$1.run(URLClassLoader.java:458) ~[?:?]
at java.net.URLClassLoader$1.run(URLClassLoader.java:452) ~[?:?]
at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
at java.net.URLClassLoader.findClass(URLClassLoader.java:451) ~[?:?]
at
org.apache.flink.util.ChildFirstClassLoader.loadClassWithoutExceptionHandling(ChildFirstClassLoader.java:71)
~[flink-dist_2.12-1.11.0.jar:1.11.0]
at
org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:48)
[flink-dist_2.12-1.11.0.jar:1.11.0]
at java.lang.ClassLoader.loadClass(ClassLoader.java:522) [?:?
{code}
]
> Flink standalone cluster throws metaspace OOM after submitting multiple
> PyFlink UDF jobs.
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-20333
> URL: https://issues.apache.org/jira/browse/FLINK-20333
> Project: Flink
> Issue Type: Bug
> Components: API / Python
> Reporter: Wei Zhong
> Assignee: Wei Zhong
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.12.0, 1.11.3
>
>
> Currently the Flink standalone cluster will throw metaspace OOM after
> submitting multiple PyFlink UDF jobs. The root cause is that currently the
> PyFlink classes are running in user classloader and so each job creates a
> separate user class loader to load PyFlink related classes. There are many
> soft references and Finalizers in memory (introduced by the underlying
> Netty), which prevents the garbage collection of the user classloader of
> already finished PyFlink jobs.
> Due to their existence, it needs multiple full gc to reclaim the classloader
> of the completed job. If only one full gc is performed before the metaspace
> space is insufficient, then OOM will occur.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)