Re: Setting Spark TMP Directory in Cluster Mode

2015-09-30 Thread mufy
Any takers? :-)


---
Mufeed Usman
My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
<http://mufeed.livejournal.com>




On Mon, Sep 28, 2015 at 10:19 AM, mufy <mufeed.us...@gmail.com> wrote:

> Hello Akhil,
>
> I do not see how that would work for a YARN cluster mode execution
> since the local directories used by the Spark executors and the Spark
> driver are the local directories that are configured for YARN
> (yarn.nodemanager.local-dirs). If you specify a different path with
> SPARK_LOCAL_DIRS, that path will be ignored.
>
>
> ---
> Mufeed Usman
> My LinkedIn <http://www.linkedin.com/pub/mufeed-usman/28/254/400> | My
> Social Cause <http://www.vision2016.org.in/> | My Blogs : LiveJournal
> <http://mufeed.livejournal.com>
>
>


Setting Spark TMP Directory in Cluster Mode

2015-09-25 Thread mufy
Faced with an issue where Spark temp files get filled under
/opt/spark-1.2.1/tmp on the local filesystem on the worker nodes. Which
parameter/configuration sets that location?

The spark-env.sh file has the folders set as,

export SPARK_HOME=/opt/spark-1.2.1
export SPARK_WORKER_DIR=$SPARK_HOME/tmp

I could not find any parameter SPARK_TMP_DIR under stock Spark
documentation. DataStax did talk something about it. Once I know where this
can be set I'm thinking of pointing that location to an NFS mounted
location so that more space can be used without having the fear of jobs
failing due to space running out.


I could also see 'java.io.tmpdir' getting set to the below in the
spark-env.sh.

-Djava.io.tmpdir=/opt/spark-1.2.1/tmp/spark-tmp

Tried setting it as,

export _JAVA_OPTIONS=-Djava.io.tmpdir=/new/tmp/dir


Did a grep again to see if it has picked up. It should have shown something
like,

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/new/tmp/dir


in the
/opt/spark-1.2.1/logs/spark-hdplab-org.apache.spark.deploy.master.Master-1-node-01.out
log. But is still seen pointing to /opt/spark-1.2.1/tmp only.

$ export _JAVA_OPTIONS=-Djava.io.tmpdir=/new/tmp/dir

$ grep -iR "/opt/spark-1.2.1/tmp" /opt/spark-1.2.1/*
/opt/spark-1.2.1/logs/spark-hdplab-org.apache.spark.deploy.master.Master-1-node-01.out:Picked
up _JAVA_OPTIONS: -Djava.io.tmpdir=/opt/spark-1.2.1/tmp/spark-tmp


Also, I was wondering if setting setting SPARK_DAEMON_JAVA_OPTS which is
used by Spark to set JVM options will help here even though
http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts
talks
about this in a Spark standalone context.

---
Mufeed Usman
My LinkedIn  | My
Social Cause  | My Blogs : LiveJournal