[
https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563187#comment-14563187
]
Josh Rosen commented on SPARK-4834:
-----------------------------------
Hey [~zfry],
Could you open a separate issue for this? That will make it easier for us to
track where the fix is applied, whether it introduces any new regressions, etc.
Thanks!.
> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
> Key: SPARK-4834
> URL: https://issues.apache.org/jira/browse/SPARK-4834
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Marcelo Vanzin
> Assignee: Marcelo Vanzin
> Fix For: 1.2.1, 1.3.0
>
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on
> the same host by using a lock file and a cache file for each file the
> executor needs to download. The problem is that these lock and cache files
> are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no
> files are left behind. But on standalone, there's no such thing as "the app's
> dir"; files will end up in "/tmp" or in whatever place the user configure in
> "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just
> call File.deleteOnExit()!" because we're talking about multiple processes
> accessing these files, so to maintain the efficiency gains of the original
> change, the files should only be deleted when the application is finished.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]