[ 
https://issues.apache.org/jira/browse/FLINK-9381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478670#comment-16478670
 ] 

ASF GitHub Bot commented on FLINK-9381:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/6030

    [FLINK-9381] Release blobs after job termination

    ## What is the purpose of the change
    
    Properly remove job blobs from BlobServer after the job terminates. If the 
job reaches a globally terminal
    state, then the HA blob store files will also be cleared. In case of a 
suspension or that the job is not
    finished (e.g. another process finsihes the job concurrently), we only 
remove the local blob server files.
    
    Additionally, we properly release the user code class loader registered in 
the JobManagerRunner when it closes.
    
    Moreover, this commit extends the `BlobServer#cleanupJob` method to take a 
second argument which specifies whether the `BlobStore` files shall be cleaned 
up or not.
    
    ## Brief change log
    
    - Properly deregister user code class loader from `LibraryCacheManager` in 
`JobManagerRunner`
    - Remove BlobServer files if the job is removed from the `Dispatcher` in 
the `removeJob` method
    - Remove HA `BlobStore` files if the job reached a globally terminal state
    
    ## Verifying this change
    
    - Added `JobManagerRunnerTest#testLibraryCacheManagerRegistration`
    - Added `DispatcherResourceCleanupTest`
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)
      - If yes, how is the feature documented? (not applicable)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixBlobRelease

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/6030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6030
    
----
commit 7b77fc85010c8831ecc3704a773f0f944da838a5
Author: Till Rohrmann <trohrmann@...>
Date:   2018-05-17T06:58:07Z

    [FLINK-9381] Release blobs after job termination
    
    Properly remove job blobs from BlobServer after the job terminates. If the 
job reaches a globally terminal
    state, then the HA blob store files will also be cleared. In case of a 
suspension or that the job is not
    finished (e.g. another process finsihes the job concurrently), we only 
remove the local blob server files.
    
    Additionally, we properly release the user code class loader registered in 
the JobManagerRunner when it
    closes.

----


> BlobServer data for a job is not getting cleaned up at JM
> ---------------------------------------------------------
>
>                 Key: FLINK-9381
>                 URL: https://issues.apache.org/jira/browse/FLINK-9381
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.5.0
>         Environment: Flink 1.5.0 RC3 Commit e725269
>            Reporter: Amit Jain
>            Assignee: Till Rohrmann
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> We are running Flink 1.5.0 rc3 with YARN as cluster manager and found
>  Job Manager is getting killed due to out of disk error.
>  
>  Upon further analysis, we found blob server data for a job is not
>  getting cleaned up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to