Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
Hi,

I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).

What's the difference?

I have set -Dspark.local.dir for all my worker nodes but I'm still seeing 
directories being created in /tmp when the job is running.

I have also tried setting -Dspark.local.dir when I run the application.

Thanks!



Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel

  
  
I'm not 100% sure but I think it goes
  like this : 
  
  spark.local.dir can and should be set both on the executors and on
  the driver (if the driver broadcast variables, the files will be
  stored in this directory)
  
  the SPARK_WORKER_DIR is where the jars and the log output of the
  executors is placed (default $SPARK_HOME/work/) and it should be
  cleaned regularly 
  
  In $SPARK_HOME/logs are found the logs of the workers and master
  
  Guillaume


  Hi,

I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).

What's the difference?

I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.

I have also tried setting -Dspark.local.dir when I run the application.

Thanks!





-- 
  
  

  

  

  


Guillaume
PITEL, Prsident 
  +33(0)6 25 48 86 80
 
eXenSa
S.A.S. 
 41, rue Prier -
92120 Montrouge - FRANCE 
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37
05   

  

  

  

  



Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Tsai Li Ming
 spark.local.dir can and should be set both on the executors and on the 
 driver (if the driver broadcast variables, the files will be stored in this 
 directory)
Do you mean the worker nodes?

Don’t think they are jetty connectors and the directories are empty:
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files

I run the application like this, even with the java.io.tmpdir :
bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm 
-Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR 
spark://oct1:7077 10




On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com wrote:

 Also, I think the jetty connector will create a small file or directory in 
 /tmp regardless of the spark.local.dir 
 
 It's very small, about 10KB
 
 Guillaume
 I'm not 100% sure but I think it goes like this : 
 
 spark.local.dir can and should be set both on the executors and on the 
 driver (if the driver broadcast variables, the files will be stored in this 
 directory)
 
 the SPARK_WORKER_DIR is where the jars and the log output of the executors 
 is placed (default $SPARK_HOME/work/) and it should be cleaned regularly 
 
 In $SPARK_HOME/logs are found the logs of the workers and master
 
 Guillaume
 Hi,
 
 I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).
 
 What's the difference?
 
 I have set -Dspark.local.dir for all my worker nodes but I'm still seeing 
 directories being created in /tmp when the job is running.
 
 I have also tried setting -Dspark.local.dir when I run the application.
 
 Thanks!
 
 
 
 -- 
 Mail Attachment.png
 Guillaume PITEL, Président 
 +33(0)6 25 48 86 80
 
 eXenSa S.A.S. 
 41, rue Périer - 92120 Montrouge - FRANCE 
 Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
 
 
 -- 
 exensa_logo_mail.png
 Guillaume PITEL, Président 
 +33(0)6 25 48 86 80
 
 eXenSa S.A.S. 
 41, rue Périer - 92120 Montrouge - FRANCE 
 Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05



Re: Spark temp dir (spark.local.dir)

2014-03-13 Thread Guillaume Pitel

  
  



  
  

  

  spark.local.dir can and
should be set both on the executors and on the driver
(if the driver broadcast variables, the files will be
stored in this directory)
  

  

Do you mean the worker nodes?


No, only the driver broadcasts I think.


  Don’t think they are jetty connectors and the directories are
empty:
  /tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/jars
/tmp/spark-3e330cdc-7540-4313-9f32-9fa109935f17/files
  


Indeed, I must have confused that with something else. Spark local
dir contains directory starting with spark-local-* , so I don't know
what these files are.


  I run the application like this, even with the java.io.tmpdir
:
  bin/run-example -Dspark.executor.memory=14g -Dspark.local.dir=/mnt/storage1/lm -Djava.io.tmpdir=/mnt/storage1/lm org.apache.spark.examples.SparkLR
spark://oct1:7077 10
  
  


How do you pass the spark.local.dir to the workers ? in
SPARK_JAVA_OPTS during SparkContext creation ? It should probably be
passed in the spark-env.sh because it can differ on each node

Guillaume




  
  
  
  
  On 13 Mar, 2014, at 5:33 pm, Guillaume Pitel guillaume.pi...@exensa.com
wrote:
  

  


  Also, I think the jetty
connector will create a small file or directory in /tmp
regardless of the spark.local.dir 

It's very small, about 10KB

Guillaume
  
  

I'm not 100% sure but I
  think it goes like this : 
  
  spark.local.dir can and should be set both on the
  executors and on the driver (if the driver broadcast
  variables, the files will be stored in this directory)
  
  the SPARK_WORKER_DIR is where the jars and the log
  output of the executors is placed (default
  $SPARK_HOME/work/) and it should be cleaned regularly
  
  
  In $SPARK_HOME/logs are found the logs of the workers
  and master
  
  Guillaume


  Hi,

I'm confused about the -Dspark.local.dir and SPARK_WORKER_DIR(--work-dir).

What's the difference?

I have set -Dspark.local.dir for all my worker nodes but I'm still seeing directories being created in /tmp when the job is running.

I have also tried setting -Dspark.local.dir when I run the application.

Thanks!





-- 
  
  

  

  

  Mail
  Attachment.png


Guillaume PITEL,
Président 
  +33(0)6 25 48 86 80
 
eXenSa S.A.S. 
 41, rue Périer -
92120 Montrouge - FRANCE 
Tel +33(0)1 84 16 36 77 / Fax
+33(0)9 72 28 37 05  
  

  

  

   
  
  
  
  -- 


  

  

  
exensa_logo_mail.png
  
  
  Guillaume
  PITEL, Président 
+33(0)6 25 48 86 80
   
  eXenSa S.A.S. 
   41, rue Périer -
  92120 Montrouge - FRANCE 
  Tel +33(0)1 84 16 36 77 / Fax +33(0)9
  72 28 37 05   
  

  

  
 

  


  



--