Ah, yes, I missed that part it's `spark.local.dir`
spark.local.dir /tmp Directory to use for "scratch" space in Spark, including map output files and RDDs that get stored on disk. This should be on a fast, local disk in your system. It can also be a comma-separated list of multiple directories on different disks. NOTE: In Spark 1.0 and later this will be overridden by SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set by the cluster manager. On Wed, Dec 20, 2017 at 2:58 PM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > I do think that there is an option to set the temporary shuffle location > to a particular directory. While working with EMR I set it to /mnt1/. Let > me know in case you are not able to find it. > > On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob <mia...@ca.ibm.com> wrote: > >> This code generates files under /tmp...blockmgr... which do not get >> cleaned up after the job finishes. >> >> Anything wrong with the code below? or are there any known issues with >> spark not cleaning up /tmp files? >> >> >> window = Window.\ >> partitionBy('***', 'date_str').\ >> orderBy(sqlDf['***']) >> >> sqlDf = sqlDf.withColumn("***",rank().over(window)) >> df_w_least = sqlDf.filter("***=1") >> >> >> >> >> >> Regards, >> >> *Mihai Iacob* >> DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics >> >> --------------------------------------------------------------------- To >> unsubscribe e-mail: user-unsubscr...@spark.apache.org > > >