Spark temp dir (spark.local.dir)
Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks!
Re: Spark temp dir (spark.local.dir)
I'm not 100% sure but I think it goes like this : spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) the SPARK_WORKER_DIR is where the jars and the log output of the executors is placed (default $SPARK_HOME/work/) and it should be cleaned regularly In $SPARK_HOME/logs are found the logs of the workers and master Guillaume Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks! -- Guillaume PITEL, Prsident +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Prier - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
Re: Spark temp dir (spark.local.dir)
spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) Do you mean the worker nodes? Don’t think they are jetty connectors and the directories are empty: /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files I run the application like this, even with the java.io.tmpdir : bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR spark://oct1:7077 10 On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote: Also, I think the jetty connector will create a small file or directory in /tmp regardless of the spark.local.dir It's very small, about 10KB Guillaume I'm not 100% sure but I think it goes like this : spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) the SPARK_WORKER_DIR is where the jars and the log output of the executors is placed (default $SPARK_HOME/work/) and it should be cleaned regularly In $SPARK_HOME/logs are found the logs of the workers and master Guillaume Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks! -- Mail Attachment.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 -- exensa_logo_mail.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
Re: Spark temp dir (spark.local.dir)
spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) Do you mean the worker nodes? No, only the driver broadcasts I think. Don’t think they are jetty connectors and the directories are empty: /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files Indeed, I must have confused that with something else. Spark local dir contains directory starting with spark-local-* , so I don't know what these files are. I run the application like this, even with the java.io.tmpdir : bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR spark://oct1:7077 10 How do you pass the spark.local.dir to the workers ? in SPARK_JAVA_OPTS during SparkContext creation ? It should probably be passed in the spark-env.sh because it can differ on each node Guillaume On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote: Also, I think the jetty connector will create a small file or directory in /tmp regardless of the spark.local.dir It's very small, about 10KB Guillaume I'm not 100% sure but I think it goes like this : spark.local.dir can and should be set both on the executors and on the driver (if the driver broadcast variables, the files will be stored in this directory) the SPARK_WORKER_DIR is where the jars and the log output of the executors is placed (default $SPARK_HOME/work/) and it should be cleaned regularly In $SPARK_HOME/logs are found the logs of the workers and master Guillaume Hi, I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir). What's the difference? I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running. I have also tried setting -Dspark.local.dir when I run the application. Thanks! -- Mail Attachment.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 -- exensa_logo_mail.png Guillaume PITEL, Président +33(0)6 25 48 86 80 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05 --