Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated.
[test@ spark]$ SPARK_YARN_USER_ENV="spark.yarn.preserve.staging.files=false" SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop2.2.0.jar ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi --driver-memory 512M --driver-library-path /opt/hadoop/share/hadoop/mapreduce/lib/hadoop-lzo.jar --executor-memory 512M --executor-cores 1 --queue research --num-executors 2 examples/target/spark-examples_2.10-1.0.0.jar Using properties file: null Using properties file: null Parsed arguments: master yarn deployMode cluster executorMemory 512M executorCores 1 totalExecutorCores null propertiesFile null driverMemory 512M driverCores null driverExtraClassPath null driverExtraLibraryPath /opt/hadoop/share/hadoop/mapreduce/lib/hadoop-lzo.jar driverExtraJavaOptions null supervise false queue research numExecutors 2 files null pyFiles null archives null mainClass org.apache.spark.examples.SparkPi primaryResource file:/opt/spark/examples/target/spark-examples_2.10-1.0.0.jar name org.apache.spark.examples.SparkPi childArgs [] jars null verbose true Default properties from null: Using properties file: null Main class: org.apache.spark.deploy.yarn.Client Arguments: --jar file:/opt/spark/examples/target/spark-examples_2.10-1.0.0.jar --class org.apache.spark.examples.SparkPi --name org.apache.spark.examples.SparkPi --driver-memory 512M --queue research --num-executors 2 --executor-memory 512M --executor-cores 1 System properties: spark.driver.extraLibraryPath -> /opt/hadoop/share/hadoop/mapreduce/lib/hadoop-lzo.jar SPARK_SUBMIT -> true spark.app.name -> org.apache.spark.examples.SparkPi Classpath elements: From: alee...@hotmail.com To: user@spark.apache.org Subject: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode Date: Wed, 18 Jun 2014 11:05:12 -0700 Hi All, Have anyone ran into the same problem? By looking at the source code in official release (rc11),this property settings is set to false by default, however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it to fill up the disk pretty fast since SparkContext deploys the fat JAR file (~115MB) every time for each job and it is not cleaned up. yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala: val preserveFiles = sparkConf.get("spark.yarn.preserve.staging.files", "false").toBoolean [test@spark ~]$ hdfs dfs -ls .sparkStagingFound 46 itemsdrwx------ - test users 0 2014-05-01 01:42 .sparkStaging/application_1398370455828_0050drwx------ - test users 0 2014-05-01 02:03 .sparkStaging/application_1398370455828_0051drwx------ - test users 0 2014-05-01 02:04 .sparkStaging/application_1398370455828_0052drwx------ - test users 0 2014-05-01 05:44 .sparkStaging/application_1398370455828_0053drwx------ - test users 0 2014-05-01 05:45 .sparkStaging/application_1398370455828_0055drwx------ - test users 0 2014-05-01 05:46 .sparkStaging/application_1398370455828_0056drwx------ - test users 0 2014-05-01 05:49 .sparkStaging/application_1398370455828_0057drwx------ - test users 0 2014-05-01 05:52 .sparkStaging/application_1398370455828_0058drwx------ - test users 0 2014-05-01 05:58 .sparkStaging/application_1398370455828_0059drwx------ - test users 0 2014-05-01 07:38 .sparkStaging/application_1398370455828_0060drwx------ - test users 0 2014-05-01 07:41 .sparkStaging/application_1398370455828_0061….drwx------ - test users 0 2014-06-16 14:45 .sparkStaging/application_1402001910637_0131drwx------ - test users 0 2014-06-16 15:03 .sparkStaging/application_1402001910637_0135drwx------ - test users 0 2014-06-16 15:16 .sparkStaging/application_1402001910637_0136drwx------ - test users 0 2014-06-16 15:46 .sparkStaging/application_1402001910637_0138drwx------ - test users 0 2014-06-16 23:57 .sparkStaging/application_1402001910637_0157drwx------ - test users 0 2014-06-17 05:55 .sparkStaging/application_1402001910637_0161 Is this something that needs to be explicitly set in :SPARK_YARN_USER_ENV="spark.yarn.preserve.staging.files=false" http://spark.apache.org/docs/latest/running-on-yarn.htmlspark.yarn.preserve.staging.filesfalseSet to true to preserve the staged files (Spark jar, app jar, distributed cache files) at the end of the job rather then delete them.or this is a bug that is not honoring the default value and is override to true somewhere? Thanks.