In yarn mode, spark.local.dir is yarn.nodemanager.local-dirs for shuffle data and block manager disk data. What do you mean "But output files to upload to s3 still created in /tmp on slaves" ? You should have control on where to store your output data if that means your job's output.
On Tue, Mar 1, 2016 at 3:12 AM, Alexander Pivovarov <apivova...@gmail.com> wrote: > I have Spark on yarn > > I defined yarn.nodemanager.local-dirs to be /data01/yarn/nm,/data02/yarn/nm > > when I look at yarn executor container log I see that blockmanager files > created in /data01/yarn/nm,/data02/yarn/nm > > But output files to upload to s3 still created in /tmp on slaves > > I do not want Spark write heavy files to /tmp because /tmp is only 5GB > > spark slaves have two big additional disks /disk01 and /disk02 attached > > Probably I can set spark.local.dir to be /data01/tmp,/data02/tmp > > But spark master also writes some files to spark.local.dir > But my master box has only one additional disk /data01 > > So, what should I use for spark.local.dir the > spark.local.dir=/data01/tmp > or > spark.local.dir=/data01/tmp,/data02/tmp > > ? > -- Best Regards Jeff Zhang