[ https://issues.apache.org/jira/browse/HIVE-9017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244944#comment-14244944 ]
Marcelo Vanzin commented on HIVE-9017: -------------------------------------- These files are created by Spark when downloading resources for the app (e.g. application jars). In standalone mode, by default, these files will end up in /tmp (java.io.tmpdir). The problem is that the app doesn't clean up these files; in fact, it can't, because they are supposed to be shared in case multiple executors run on the same host - so one executor cannot unilaterally decide to delete them. (That's not entirely true; I guess it could, but then it would cause other executors to re-download the file when needed, so more overhead.) This is not a problem in Yarn mode, since the temp dir is under a Yarn-managed directory that is deleted when the app shuts down. So, while I think of a clean way to fix this in Spark, the following can be done on the Hive side: - create an app-specific temp directory before launching the Spark app - set {{spark.local.dir}} to that location - delete the directory when the client shuts down > Clean up temp files of RSC [Spark Branch] > ----------------------------------------- > > Key: HIVE-9017 > URL: https://issues.apache.org/jira/browse/HIVE-9017 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Rui Li > > Currently RSC will leave a lot of temp files in {{/tmp}}, including > {{*_lock}}, {{*_cache}}, {{spark-submit.*.properties}}, etc. > We should clean up these files or it will exhaust disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)