[ https://issues.apache.org/jira/browse/FLINK-6627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016346#comment-16016346 ]
Andrey commented on FLINK-6627: ------------------------------- That's the problem. How cleanup script will know if "container" dir is used by current TM? + if we run 2 TM "container" dir should have different names (which currently solved by using UUID.randomUUID()) > Expose tmp directories via API > ------------------------------ > > Key: FLINK-6627 > URL: https://issues.apache.org/jira/browse/FLINK-6627 > Project: Flink > Issue Type: Improvement > Affects Versions: 1.2.0 > Reporter: Andrey > > Currently tmp/blob directories created based on fixed baseDir and random > postfix. For example blob directory: > {code} > new File(baseDir, String.format("blobStore-%s", UUID.randomUUID().toString())) > {code} > This directory name is not exposed externally. This will cause several issues > in the following scenario: > 1) Start 1 task manager > 2) random blob directory created. For example: "blob-1" > 3) Start 2 task manager > 4) random blob directory created. For example: "blob-2" > 5) 1 task manager dies unexpectedly. (kill -9 or OOM). > 6) directory "blob-1" will not be deleted. > 7) 1 task manager automatically restarted > 8) random blob directory created. For example: "blob-3" > The issues: > * The directory "blob-1" will never be deleted. > * The external cleanup script cannot get information about current > directories being in use. Because information is not exposed externally. So > it cannot delete unused directories. > * Sorting directories by "created time" and keeping last X, won't help, > because 1 faulty task manager could generate X+1 new directories. > * giving different "blob.storage.directory" for different task managers is > not a scalable solution for cloud/docker deployment, because there should be > central storage for current number of running task managers. > Proposed solution: > * expose via rest API current working directory for blob/tmp. In that case: > ** cleanup script could get all blob/tmp directories being in use from the > cluster > ** get all blob/tmp directories ("ls") > ** find blob/tmp directories not being used. > ** delete them -- This message was sent by Atlassian JIRA (v6.3.15#6346)