[ 
https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4834:
------------------------------
    Assignee: Marcelo Vanzin

> Spark fails to clean up cache / lock files in local dirs
> --------------------------------------------------------
>
>                 Key: SPARK-4834
>                 URL: https://issues.apache.org/jira/browse/SPARK-4834
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.3.0, 1.2.1
>
>
> This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa.
> That change shares downloaded jar / files among multiple executors running on 
> the same host by using a lock file and a cache file for each file the 
> executor needs to download. The problem is that these lock and cache files 
> are never deleted.
> On Yarn, the app's dir is automatically deleted when the app ends, so no 
> files are left behind. But on standalone, there's no such thing as "the app's 
> dir"; files will end up in "/tmp" or in whatever place the user configure in 
> "SPARK_LOCAL_DIRS", and will eventually start to fill that volume.
> We should add a way to clean up these files. It's not as simple as "hey, just 
> call File.deleteOnExit()!" because we're talking about multiple processes 
> accessing these files, so to maintain the efficiency gains of the original 
> change, the files should only be deleted when the application is finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to