[ 
https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186221#comment-17186221
 ] 

ShenDa commented on FLINK-19005:
--------------------------------

[~chesnay] 
Thanks for your detailed instruction. 
But I still think there's maybe something wrong in Flink. I find that the 
JdbcInputFormat & JdbcOutputFormat is key reason cause the Metaspace OOM, 
because the java.sql.DriverManager doesn't release the reference of the Driver. 
The DriverManager is loaded by java.internal.ClassLoader but the driver is 
loaded by ChildFisrtClassLoader, which means the ChildFirstClassLoader can't be 
garbage collected according analyzation of dump file.  
The following code is used by me to reproduce the issue and  I use 
org.postgresql.Driver as jdbc Driver.
{code:java}
public static void main(String[] args) throws Exception {
        EnvironmentSettings envSettings = EnvironmentSettings.newInstance()
                        .useBlinkPlanner() !origin-jdbc-inputformat.png! 
                        .inBatchMode()
                        .build();
        TableEnvironment tEnv = TableEnvironment.create(envSettings);

        tEnv.executeSql(
                        "CREATE TABLE " + INPUT_TABLE + "(" +
                                        "id BIGINT," +
                                        "timestamp6_col TIMESTAMP(6)," +
                                        "timestamp9_col TIMESTAMP(6)," +
                                        "time_col TIME," +
                                        "real_col FLOAT," +
                                        "decimal_col DECIMAL(10, 4)" +
                                        ") WITH (" +
                                        "  'connector.type'='jdbc'," +
                                        "  'connector.url'='" + DB_URL + "'," +
                                        "  'connector.table'='" + INPUT_TABLE + 
"'," +
                                        "  'connector.USERNAME'='" + USERNAME + 
"'," +
                                        "  'connector.PASSWORD'='" + PASSWORD + 
"'" +
                                        ")"
        );

        TableResult tableResult = tEnv.executeSql("SELECT timestamp6_col, 
decimal_col FROM " + INPUT_TABLE);
        tableResult.collect();
}
{code}
And below diagram shows the Metaspace usage constantly growing up, and finally 
TaskManager will be offline.
 !origin-jdbc-inputformat.png! 

----
Additional, I try to fix this issue by appending the following code to the 
function closeInputFormat() which can finally trigger garbage collect in 
Metaspace.

{code:java}
try{
        final Enumeration<Driver> drivers = DriverManager.getDrivers();
        while (drivers.hasMoreElements()) {
                DriverManager.deregisterDriver(drivers.nextElement());
        }
} catch (SQLException se) {
        LOG.info("Inputformat couldn't be closed - " + se.getMessage());
}
{code}
The following diagram shows the usage of Metaspace will be decreased.
 !modified-jdbc-inputformat.png! 

> used metaspace grow on every execution
> --------------------------------------
>
>                 Key: FLINK-19005
>                 URL: https://issues.apache.org/jira/browse/FLINK-19005
>             Project: Flink
>          Issue Type: Bug
>          Components: Client / Job Submission, Runtime / Configuration, 
> Runtime / Coordination
>    Affects Versions: 1.11.1
>            Reporter: Guillermo Sánchez
>            Assignee: Chesnay Schepler
>            Priority: Major
>         Attachments: heap_dump_after_10_executions.zip, 
> heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz, 
> modified-jdbc-inputformat.png, origin-jdbc-inputformat.png
>
>
> Hi !
> Im running a 1.11.1 flink cluster, where I execute batch jobs made with 
> DataSet API.
> I submit these jobs every day to calculate daily data.
> In every execution, cluster's used metaspace increase by 7MB and its never 
> released.
> This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i 
> need to restart the cluster to clean the metaspace
> taskmanager.memory.jvm-metaspace.size is set to 512mb
> Any idea of what could be causing this metaspace grow and why is it not 
> released ?
>  
> ================================================
> === Summary ======================================
> ================================================
> Case 1, reported by [~gestevez]:
> * Flink 1.11.1
> * Java 11
> * Maximum Metaspace size set to 512mb
> * Custom Batch job, submitted daily
> * Requires restart every 15 days after an OOM
>  Case 2, reported by [~Echo Lee]:
> * Flink 1.11.0
> * Java 11
> * G1GC
> * WordCount Batch job, submitted every second / every 5 minutes
> * eventually fails TaskExecutor with OOM
> Case 3, reported by [~DaDaShen]
> * Flink 1.11.0
> * Java 11
> * WordCount Batch job, submitted every 5 seconds
> * growing Metaspace, eventually OOM
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to