[ 
https://issues.apache.org/jira/browse/FLINK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493019#comment-15493019
 ] 

ASF GitHub Bot commented on FLINK-4485:
---------------------------------------

GitHub user mxm opened a pull request:

    https://github.com/apache/flink/pull/2499

    [FLINK-4485] close and remove user class loader after job completion

    Keeping the user class loader around after job completion may lead to
    excessive temp space usage because all user jars are kept until the
    class loader is garbage collected. Tests showed that garbage collection
    can be delayed for a long time after the class loader is not referenced
    anymore. Note that for the class loader to not be referenced anymore,
    its job has to be removed from the archive.
    
    The fastest way to minimize temp space usage is to close and remove the
    URLClassloader after job completion. This requires us to keep a
    serializable copy of all data which needs the user class loader after
    job completion, e.g. to display data on the web interface.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mxm/flink FLINK-4485

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2499.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2499
    
----
commit 6ed17b9f5b9c13c80200ccf3db82bbfe727830bb
Author: Maximilian Michels <m...@apache.org>
Date:   2016-09-15T09:00:58Z

    [FLINK-4485] close and remove user class loader after job completion
    
    Keeping the user class loader around after job completion may lead to
    excessive temp space usage because all user jars are kept until the
    class loader is garbage collected. Tests showed that garbage collection
    can be delayed for a long time after the class loader is not referenced
    anymore. Note that for the class loader to not be referenced anymore,
    its job has to be removed from the archive.
    
    The fastest way to minimize temp space usage is to close and remove the
    URLClassloader after job completion. This requires us to keep a
    serializable copy of all data which needs the user class loader after
    job completion, e.g. to display data on the web interface.

----


> Finished jobs in yarn session fill /tmp filesystem
> --------------------------------------------------
>
>                 Key: FLINK-4485
>                 URL: https://issues.apache.org/jira/browse/FLINK-4485
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.1.0
>            Reporter: Niels Basjes
>            Assignee: Maximilian Michels
>            Priority: Blocker
>
> On a Yarn cluster I start a yarn-session with a few containers and task slots.
> Then I fire a 'large' number of Flink batch jobs in sequence against this 
> yarn session. It is the exact same job (java code) yet it gets different 
> parameters.
> In this scenario it is exporting HBase tables to files in HDFS and the 
> parameters are about which data from which tables and the name of the target 
> directory.
> After running several dozen jobs the jobs submission started to fail and we 
> investigated.
> We found that the cause was that on the Yarn node which was hosting the 
> jobmanager the /tmp file system was full (4GB was 100% full).
> How ever the output of {{du -hcs /tmp}} showed only 200MB in use.
> We found that a very large file (we guess it is the jar of the job) was put 
> in /tmp , used, deleted yet the file handle was not closed by the jobmanager.
> As soon as we killed the jobmanager the disk space was freed.
> The summary of the impact of this is that a yarn-session that receives enough 
> jobs brings down the Yarn node for all users.
> See parts of the output we got from {{lsof}} below.
> {code}
> COMMAND     PID      USER   FD      TYPE             DEVICE      SIZE       
> NODE NAME
> java      15034   nbasjes  550r      REG             253,17  66219695        
> 245 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000003 
> (deleted)
> java      15034   nbasjes  551r      REG             253,17  66219695        
> 252 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000007 
> (deleted)
> java      15034   nbasjes  552r      REG             253,17  66219695        
> 267 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000012 
> (deleted)
> java      15034   nbasjes  553r      REG             253,17  66219695        
> 250 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000005 
> (deleted)
> java      15034   nbasjes  554r      REG             253,17  66219695        
> 288 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000018 
> (deleted)
> java      15034   nbasjes  555r      REG             253,17  66219695        
> 298 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000025 
> (deleted)
> java      15034   nbasjes  557r      REG             253,17  66219695        
> 254 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000008 
> (deleted)
> java      15034   nbasjes  558r      REG             253,17  66219695        
> 292 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000019 
> (deleted)
> java      15034   nbasjes  559r      REG             253,17  66219695        
> 275 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000013 
> (deleted)
> java      15034   nbasjes  560r      REG             253,17  66219695        
> 159 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000002 
> (deleted)
> java      15034   nbasjes  562r      REG             253,17  66219695        
> 238 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000001 
> (deleted)
> java      15034   nbasjes  568r      REG             253,17  66219695        
> 246 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000004 
> (deleted)
> java      15034   nbasjes  569r      REG             253,17  66219695        
> 255 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000009 
> (deleted)
> java      15034   nbasjes  571r      REG             253,17  66219695        
> 299 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000026 
> (deleted)
> java      15034   nbasjes  572r      REG             253,17  66219695        
> 293 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000020 
> (deleted)
> java      15034   nbasjes  574r      REG             253,17  66219695        
> 256 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000010 
> (deleted)
> java      15034   nbasjes  575r      REG             253,17  66219695        
> 302 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000029 
> (deleted)
> java      15034   nbasjes  576r      REG             253,17  66219695        
> 294 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000021 
> (deleted)
> java      15034   nbasjes  577r      REG             253,17  66219695        
> 262 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000011 
> (deleted)
> java      15034   nbasjes  578r      REG             253,17  66219695        
> 251 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000006 
> (deleted)
> java      15034   nbasjes  580r      REG             253,17  66219695        
> 295 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000022 
> (deleted)
> java      15034   nbasjes  581r      REG             253,17  66219695        
> 300 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000027 
> (deleted)
> java      15034   nbasjes  582r      REG             253,17  66219695        
> 188 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/cache/blob_e318d1698aa6e7dc91e5f4a9f8ba29781aebd8c4
>  (deleted)
> java      15034   nbasjes  585r      REG             253,17  66219695        
> 279 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000014 
> (deleted)
> java      15034   nbasjes  586r      REG             253,17  66219695        
> 296 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000023 
> (deleted)
> java      15034   nbasjes  588r      REG             253,17  66219695        
> 301 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000028 
> (deleted)
> java      15034   nbasjes  589r      REG             253,17  66219695        
> 297 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000024 
> (deleted)
> java      15034   nbasjes  598r      REG             253,17  66219695        
> 280 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000015 
> (deleted)
> java      15034   nbasjes  601r      REG             253,17  66219695        
> 289 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000016 
> (deleted)
> java      15034   nbasjes  604r      REG             253,17  66219695        
> 284 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000017 
> (deleted)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to