[ https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337128#comment-14337128 ]
Sean Owen commented on SPARK-5836: ---------------------------------- I'd like to take this up, since I've heard versions of this come up frequently lately. A first step is indeed improving documentation. I want to confirm or deny things I only sort of know about how temp files are treated. - Temp files/dirs created by executors may live as long as the executors, but should be deleted with executors? - Shuffle files however may live longer? - {{spark.cleaner.ttl}} is relevant to this or no? If we believe that temp files die when they should (er, well, [~vanzin] is fixing a few things around temp dirs right now), then is the surprising thing here the life of shuffle files? In which case maybe [~ilganeli] can cover this when writing up some basics about how the shuffle works? But I want to figure out definitively what the right thing is to say about behavior right now, even if the behavior should or could be different in the future. CC [~sandyr] > Highlight in Spark documentation that by default Spark does not delete its > temporary files > ------------------------------------------------------------------------------------------ > > Key: SPARK-5836 > URL: https://issues.apache.org/jira/browse/SPARK-5836 > Project: Spark > Issue Type: Improvement > Components: Documentation > Reporter: Tomasz Dudziak > > We recently learnt the hard way (in a prod system) that Spark by default does > not delete its temporary files until it is stopped. WIthin a relatively short > time span of heavy Spark use the disk of our prod machine filled up > completely because of multiple shuffle files written to it. We think there > should be better documentation around the fact that after a job is finished > it leaves a lot of rubbish behind so that this does not come as a surprise. > Probably a good place to highlight that fact would be the documentation of > {{spark.local.dir}} property, which controls where Spark temporary files are > written. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org