Re: /tmp fills up to 100GB when using a window function

2017-12-20 Thread Vadim Semenov
Ah, yes, I missed that part it's `spark.local.dir` spark.local.dir /tmp Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories

Re: /tmp fills up to 100GB when using a window function

2017-12-20 Thread Gourav Sengupta
I do think that there is an option to set the temporary shuffle location to a particular directory. While working with EMR I set it to /mnt1/. Let me know in case you are not able to find it. On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob wrote: > This code generates files

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Vadim Semenov
vadim.seme...@datadoghq.com> > To: Mihai Iacob <mia...@ca.ibm.com> > Cc: user <user@spark.apache.org> > Subject: Re: /tmp fills up to 100GB when using a window function > Date: Tue, Dec 19, 2017 9:46 AM > > Spark doesn't remove intermediate shuffle files if t

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Mihai Iacob
When does spark remove them?   Regards,  Mihai IacobDSX Local - Security, IBM Analytics     -

Re: /tmp fills up to 100GB when using a window function

2017-12-19 Thread Vadim Semenov
Spark doesn't remove intermediate shuffle files if they're part of the same job. On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob wrote: > This code generates files under /tmp...blockmgr... which do not get > cleaned up after the job finishes. > > Anything wrong with the code

/tmp fills up to 100GB when using a window function

2017-12-18 Thread Mihai Iacob
This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes.   Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files?   window = Window.\ partitionBy('***', 'date_str').\