[jira] [Commented] (FLINK-20333) Flink standalone cluster throws metaspace OOM after submitting multiple PyFlink UDF jobs.

Dian Fu (Jira) Mon, 13 Sep 2021 22:47:04 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-20333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17414734#comment-17414734
 ]


Dian Fu commented on FLINK-20333:
---------------------------------

I think it also depends on how you handle the jars (not only the UDF jars, but 
also connector jars, etc), e.g. whether placing them in the lib directory which 
are loaded by the context class loader or submitted using 
pipeline.jars/pipeline.classpaths which are loaded by the user class loader. 
Could you try to place the jars in the lib directory and see if the issue still 
exists?

> Flink standalone cluster throws metaspace OOM after submitting multiple 
> PyFlink UDF jobs.
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-20333
>                 URL: https://issues.apache.org/jira/browse/FLINK-20333
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Python
>            Reporter: Wei Zhong
>            Assignee: Wei Zhong
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.11.3, 1.12.0
>
>
> Currently the Flink standalone cluster will throw metaspace OOM after 
> submitting multiple PyFlink UDF jobs. The root cause is that currently the 
> PyFlink classes are running in user classloader and so each job creates a 
> separate user class loader to load PyFlink related classes. There are many 
> soft references and Finalizers in memory (introduced by the underlying 
> Netty), which prevents the garbage collection of the user classloader of 
> already finished PyFlink jobs. 
> Due to their existence, it needs multiple full gc to reclaim the classloader 
> of the completed job. If only one full gc is performed before the metaspace 
> space is insufficient, then OOM will occur.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-20333) Flink standalone cluster throws metaspace OOM after submitting multiple PyFlink UDF jobs.

Reply via email to