Re: /tmp fills up to 100GB when using a window function

Vadim Semenov Wed, 20 Dec 2017 12:04:30 -0800

Ah, yes, I missed that part

it's `spark.local.dir`

spark.local.dir /tmp Directory to use for "scratch" space in Spark,
including map output files and RDDs that get stored on disk. This should be
on a fast, local disk in your system. It can also be a comma-separated list
of multiple directories on different disks. NOTE: In Spark 1.0 and later
this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or
LOCAL_DIRS (YARN) environment variables set by the cluster manager.

On Wed, Dec 20, 2017 at 2:58 PM, Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> I do think that there is an option to set the temporary shuffle location
> to a particular directory. While working with EMR I set it to /mnt1/. Let
> me know in case you are not able to find it.
>
> On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote:
>
>> This code generates files under /tmp...blockmgr... which do not get
>> cleaned up after the job finishes.
>>
>> Anything wrong with the code below? or are there any known issues with
>> spark not cleaning up /tmp files?
>>
>>
>> window = Window.\
>>               partitionBy('***', 'date_str').\
>>               orderBy(sqlDf['***'])
>>
>> sqlDf = sqlDf.withColumn("***",rank().over(window))
>> df_w_least = sqlDf.filter("***=1")
>>
>>
>>
>>
>>
>> Regards,
>>
>> *Mihai Iacob*
>> DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics
>>
>> --------------------------------------------------------------------- To
>> unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>

Re: /tmp fills up to 100GB when using a window function

Reply via email to