[ 
https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14337128#comment-14337128
 ] 

Sean Owen commented on SPARK-5836:
----------------------------------

I'd like to take this up, since I've heard versions of this come up frequently 
lately. A first step is indeed improving documentation. I want to confirm or 
deny things I only sort of know about how temp files are treated.

- Temp files/dirs created by executors may live as long as the executors, but 
should be deleted with executors?
- Shuffle files however may live longer?
- {{spark.cleaner.ttl}} is relevant to this or no?

If we believe that temp files die when they should (er, well, [~vanzin] is 
fixing a few things around temp dirs right now), then is the surprising thing 
here the life of shuffle files? 

In which case maybe [~ilganeli] can cover this when writing up some basics 
about how the shuffle works?

But I want to figure out definitively what the right thing is to say about 
behavior right now, even if the behavior should or could be different in the 
future.

CC [~sandyr]

> Highlight in Spark documentation that by default Spark does not delete its 
> temporary files
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5836
>                 URL: https://issues.apache.org/jira/browse/SPARK-5836
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Tomasz Dudziak
>
> We recently learnt the hard way (in a prod system) that Spark by default does 
> not delete its temporary files until it is stopped. WIthin a relatively short 
> time span of heavy Spark use the disk of our prod machine filled up 
> completely because of multiple shuffle files written to it. We think there 
> should be better documentation around the fact that after a job is finished 
> it leaves a lot of rubbish behind so that this does not come as a surprise.
> Probably a good place to highlight that fact would be the documentation of 
> {{spark.local.dir}} property, which controls where Spark temporary files are 
> written. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to