[ https://issues.apache.org/jira/browse/SPARK-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-4834: ------------------------------ Assignee: Marcelo Vanzin > Spark fails to clean up cache / lock files in local dirs > -------------------------------------------------------- > > Key: SPARK-4834 > URL: https://issues.apache.org/jira/browse/SPARK-4834 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Fix For: 1.3.0, 1.2.1 > > > This issue was caused by https://github.com/apache/spark/commit/7aacb7bfa. > That change shares downloaded jar / files among multiple executors running on > the same host by using a lock file and a cache file for each file the > executor needs to download. The problem is that these lock and cache files > are never deleted. > On Yarn, the app's dir is automatically deleted when the app ends, so no > files are left behind. But on standalone, there's no such thing as "the app's > dir"; files will end up in "/tmp" or in whatever place the user configure in > "SPARK_LOCAL_DIRS", and will eventually start to fill that volume. > We should add a way to clean up these files. It's not as simple as "hey, just > call File.deleteOnExit()!" because we're talking about multiple processes > accessing these files, so to maintain the efficiency gains of the original > change, the files should only be deleted when the application is finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org