[ 
https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037593#comment-15037593
 ] 

Ewan Higgs commented on SPARK-5836:
-----------------------------------

[~tdas] 
{quote}
The only case there may be issues is when the external shuffle service is used.
{quote}

I see this problematic behaviour in ipython/pyspark notebooks. We can try to go 
through and unpersist and checkpoint and so on with the RDDs but the shuffle 
files don't seem to go away. We see this even though we are not using the 
external shuffle service.

> Highlight in Spark documentation that by default Spark does not delete its 
> temporary files
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5836
>                 URL: https://issues.apache.org/jira/browse/SPARK-5836
>             Project: Spark
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Tomasz Dudziak
>            Assignee: Ilya Ganelin
>            Priority: Minor
>             Fix For: 1.3.1, 1.4.0
>
>
> We recently learnt the hard way (in a prod system) that Spark by default does 
> not delete its temporary files until it is stopped. WIthin a relatively short 
> time span of heavy Spark use the disk of our prod machine filled up 
> completely because of multiple shuffle files written to it. We think there 
> should be better documentation around the fact that after a job is finished 
> it leaves a lot of rubbish behind so that this does not come as a surprise.
> Probably a good place to highlight that fact would be the documentation of 
> {{spark.local.dir}} property, which controls where Spark temporary files are 
> written. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to