Re: What should be spark.local.dir in spark on yarn?

Jeff Zhang Mon, 29 Feb 2016 15:45:07 -0800

In yarn mode, spark.local.dir is yarn.nodemanager.local-dirs for shuffle
data and block manager disk data. What do you mean "But output files to
upload to s3 still created in /tmp on slaves" ? You should have control on
where to store your output data if that means your job's output.


On Tue, Mar 1, 2016 at 3:12 AM, Alexander Pivovarov <[email protected]>
wrote:

> I have Spark on yarn
>
> I defined yarn.nodemanager.local-dirs to be /data01/yarn/nm,/data02/yarn/nm
>
> when I look at yarn executor container log I see that blockmanager files
> created in /data01/yarn/nm,/data02/yarn/nm
>
> But output files to upload to s3 still created in /tmp on slaves
>
> I do not want Spark write heavy files to /tmp because /tmp is only 5GB
>
> spark slaves have two big additional disks /disk01 and /disk02 attached
>
> Probably I can set spark.local.dir to be /data01/tmp,/data02/tmp
>
> But spark master also writes some files to spark.local.dir
> But my master box has only one additional disk /data01
>
> So, what should I use for  spark.local.dir the
> spark.local.dir=/data01/tmp
> or
> spark.local.dir=/data01/tmp,/data02/tmp
>
> ?
>



-- 
Best Regards

Jeff Zhang

Re: What should be spark.local.dir in spark on yarn?

Reply via email to