[ 
https://issues.apache.org/jira/browse/FLINK-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016346#comment-16016346
 ] 

Andrey commented on FLINK-6627:
-------------------------------

That's the problem. How cleanup script will know if "container" dir is used by 
current TM? + if we run 2 TM "container" dir should have different names (which 
currently solved by using UUID.randomUUID())

> Expose tmp directories via API
> ------------------------------
>
>                 Key: FLINK-6627
>                 URL: https://issues.apache.org/jira/browse/FLINK-6627
>             Project: Flink
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Andrey
>
> Currently tmp/blob directories created based on fixed baseDir and random 
> postfix. For example blob directory:
> {code}
> new File(baseDir, String.format("blobStore-%s", UUID.randomUUID().toString()))
> {code}
> This directory name is not exposed externally. This will cause several issues 
> in the following scenario:
> 1) Start 1 task manager
> 2) random blob directory created. For example: "blob-1"
> 3) Start 2 task manager
> 4) random blob directory created. For example: "blob-2"
> 5) 1 task manager dies unexpectedly. (kill -9 or OOM).
> 6) directory "blob-1" will not be deleted.
> 7) 1 task manager automatically restarted
> 8) random blob directory created. For example: "blob-3"
> The issues:
> * The directory "blob-1" will never be deleted. 
> * The external cleanup script cannot get information about current 
> directories being in use. Because information is not exposed externally. So 
> it cannot delete unused directories.
> * Sorting directories by "created time" and keeping last X, won't help, 
> because 1 faulty task manager could generate X+1 new directories.
> * giving different "blob.storage.directory" for different task managers is 
> not a scalable solution for cloud/docker deployment, because there should be 
> central storage for current number of running task managers.
> Proposed solution:
> * expose via rest API current working directory for blob/tmp. In that case: 
> ** cleanup script could get all blob/tmp directories being in use from the 
> cluster
> ** get all blob/tmp directories ("ls")
> ** find blob/tmp directories not being used. 
> ** delete them



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to