Ah, yes, I missed that part
it's `spark.local.dir`
spark.local.dir /tmp Directory to use for "scratch" space in Spark,
including map output files and RDDs that get stored on disk. This should be
on a fast, local disk in your system. It can also be a comma-separated list
of multiple directories
I do think that there is an option to set the temporary shuffle location to
a particular directory. While working with EMR I set it to /mnt1/. Let me
know in case you are not able to find it.
On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob wrote:
> This code generates files
vadim.seme...@datadoghq.com>
> To: Mihai Iacob <mia...@ca.ibm.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: /tmp fills up to 100GB when using a window function
> Date: Tue, Dec 19, 2017 9:46 AM
>
> Spark doesn't remove intermediate shuffle files if t
When does spark remove them?
Regards,
Mihai IacobDSX Local - Security, IBM Analytics
-
Spark doesn't remove intermediate shuffle files if they're part of the same
job.
On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob wrote:
> This code generates files under /tmp...blockmgr... which do not get
> cleaned up after the job finishes.
>
> Anything wrong with the code
This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes.
Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files?
window = Window.\
partitionBy('***', 'date_str').\