Re: Configuring shuffle write directory

Tsai Li Ming Thu, 27 Mar 2014 23:31:17 -0700

Hi,

Thanks! I found out that I wasn’t setting the SPARK_JAVA_OPTS correctly..


I took a look at the process table and saw that the 
“org.apache.spark.executor.CoarseGrainedExecutorBackend” didn’t have the 
-Dspark.local.dir set.




On 28 Mar, 2014, at 1:05 pm, Matei Zaharia <matei.zaha...@gmail.com> wrote:

> I see, are you sure that was in spark-env.sh instead of 
> spark-env.sh.template? You need to copy it to just a .sh file. Also make sure 
> the file is executable.
> 
> Try doing println(sc.getConf.toDebugString) in your driver program and seeing 
> what properties it prints. As far as I can tell, spark.local.dir should *not* 
> be set there, so workers should get it from their spark-env.sh. It’s true 
> that if you set spark.local.dir in the driver it would pass that on to the 
> workers for that job.
> 
> Matei
> 
> On Mar 27, 2014, at 9:57 PM, Tsai Li Ming <mailingl...@ltsai.com> wrote:
> 
>> Yes, I have tried that by adding it to the Worker. I can see the 
>> "app-20140328124540-000” in the local spark directory of the worker.
>> 
>> But the “spark-local” directories are always written to /tmp since is the 
>> default spark.local.dir is taken from java.io.tempdir?
>> 
>> 
>> 
>> On 28 Mar, 2014, at 12:42 pm, Matei Zaharia <matei.zaha...@gmail.com> wrote:
>> 
>>> Yes, the problem is that the driver program is overriding it. Have you set 
>>> it manually in the driver? Or how did you try setting it in workers? You 
>>> should set it by adding
>>> 
>>> export SPARK_JAVA_OPTS=“-Dspark.local.dir=whatever”
>>> 
>>> to conf/spark-env.sh on those workers.
>>> 
>>> Matei
>>> 
>>> On Mar 27, 2014, at 9:04 PM, Tsai Li Ming <mailingl...@ltsai.com> wrote:
>>> 
>>>> Anyone can help?
>>>> 
>>>> How can I configure a different spark.local.dir for each executor?
>>>> 
>>>> 
>>>> On 23 Mar, 2014, at 12:11 am, Tsai Li Ming <mailingl...@ltsai.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Each of my worker node has its own unique spark.local.dir.
>>>>> 
>>>>> However, when I run spark-shell, the shuffle writes are always written to 
>>>>> /tmp despite being set when the worker node is started.
>>>>> 
>>>>> By specifying the spark.local.dir for the driver program, it seems to 
>>>>> override the executor? Is there a way to properly define it in the worker 
>>>>> node?
>>>>> 
>>>>> Thanks!
>>>> 
>>> 
>> 
>

Re: Configuring shuffle write directory

Reply via email to