Ø  Also, it can be a problem when reusing the same sparkcontext for many runs.

That is what happen to me. We use spark jobserver and use one sparkcontext for 
all jobs. The SPARK_LOCAL_DIRS is not cleaned up and is eating disk space 
quickly.

Ningjun


From: Marius Soutier [mailto:mps....@gmail.com]
Sent: Tuesday, April 14, 2015 12:27 PM
To: Guillaume Pitel
Cc: user@spark.apache.org
Subject: Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

That's true, spill dirs don't get cleaned up when something goes wrong. We are 
are restarting long running jobs once in a while for cleanups and have 
spark.cleaner.ttl set to a lower value than the default.

On 14.04.2015, at 17:57, Guillaume Pitel 
<guillaume.pi...@exensa.com<mailto:guillaume.pi...@exensa.com>> wrote:

Right, I remember now, the only problematic case is when things go bad and the 
cleaner is not executed.

Also, it can be a problem when reusing the same sparkcontext for many runs.

Guillaume
It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned automatically. 
From the source code comments:

// SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the

// application finishes.


On 13.04.2015, at 11:26, Guillaume Pitel 
<guillaume.pi...@exensa.com<mailto:guillaume.pi...@exensa.com>> wrote:

Does it also cleanup spark local dirs ? I thought it was only cleaning 
$SPARK_HOME/work/

Guillaume
I have set SPARK_WORKER_OPTS in spark-env.sh for that. For example:

export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true 
-Dspark.worker.cleanup.appDataTtl=<seconds>"

On 11.04.2015, at 00:01, Wang, Ningjun (LNG-NPV) 
<ningjun.w...@lexisnexis.com<mailto:ningjun.w...@lexisnexis.com>> wrote:

Does anybody have an answer for this?

Thanks
Ningjun

From: Wang, Ningjun (LNG-NPV)
Sent: Thursday, April 02, 2015 12:14 PM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

I set SPARK_LOCAL_DIRS   to   C:\temp\spark-temp. When RDDs are shuffled, spark 
writes to this folder. I found that the disk space of this folder keep on 
increase quickly and at certain point I will run out of disk space.

I wonder does spark clean up the disk space in this folder once the shuffle 
operation is done? If not, I need to write a job to clean it up myself. But how 
do I know which sub folders there can be removed?

Ningjun


--
<exensa_logo_mail.png>

Guillaume PITEL, Président
+33(0)626 222 431

eXenSa S.A.S.<http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705



--
<exensa_logo_mail.png>

Guillaume PITEL, Président
+33(0)626 222 431

eXenSa S.A.S.<http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)184 163 677 / Fax +33(0)972 283 705


Reply via email to