[
https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563183#comment-14563183
]
Zach Fry commented on SPARK-4834:
---------------------------------
[~joshrosen],
We are seeing behavior on Spark 1.3.0 where the _files_ in the
{{spark.local.dir}} directories are getting cleaned up, but not the
_directories_ themselves.
Its a pretty simple repro:
Run a job that does some shuffling, wait for the shuffle files to get cleaned
up, go and look on disk at {{spark.local.dir}} and notice that the directory(s)
are still there, but there are no files in them.
Should we reopen another ticket for this? Or can we reopen this one?
> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
> Key: SPARK-4834
> URL: https://issues.apache.org/jira/browse/SPARK-4834
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Marcelo Vanzin
> Assignee: Marcelo Vanzin
> Fix For: 1.2.1, 1.3.0
>
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on
> the same host by using a lock file and a cache file for each file the
> executor needs to download. The problem is that these lock and cache files
> are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no
> files are left behind. But on standalone, there's no such thing as "the app's
> dir"; files will end up in "/tmp" or in whatever place the user configure in
> "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just
> call File.deleteOnExit()!" because we're talking about multiple processes
> accessing these files, so to maintain the efficiency gains of the original
> change, the files should only be deleted when the application is finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]