[ https://issues.apache.org/jira/browse/SPARK-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5590. ------------------------------ Resolution: Won't Fix > Create a complete reference of configurable environment variables, config > files and command-line parameters > ----------------------------------------------------------------------------------------------------------- > > Key: SPARK-5590 > URL: https://issues.apache.org/jira/browse/SPARK-5590 > Project: Spark > Issue Type: Wish > Components: Spark Core > Environment: All > Reporter: Tobias Bertelsen > > This originated as [a question on a > stackoverflow|http://stackoverflow.com/q/28219279/] > It will be great with a complete reference of the different ways of > configuring spark master and workers – especially different names of the same > parameter and the precedence of different ways of configuring the same thing. > From the original stackoverflow question: > h2. Known resources > - [The standalone > documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] is > the best I have found, but it does not clearly describes relationships > between different variables/parameters nor which take precedence over other. > - [The configuration > documentation|http://spark.apache.org/docs/1.2.0/configuration.html] provides > a good overview for application-properties, but not for the master/slave > launch-time parameters. > h2. Example problem > The [standalone > documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] > writes the following: > {quote} > the following configuration options can be passed to the master and worker > ... > `-d DIR, --work-dir DIR` Directory to use for scratch space and job > output logs (default: SPARK_HOME/work); only on worker > {quote} > and later > {quote} > `SPARK_LOCAL_DIRS` Directory to use for "scratch" space in Spark > `SPARK_WORKER_DIR` Directory to run applications in, which will include both > logs and scratch space (default: SPARK_HOME/work). > {quote} > As a spark-newbe I am a little confused by now. > - What is the relationship between `SPARK_LOCAL_DIRS`, `SPARK_WORKER_DIR`, > and `-d`. > - What if I specify them all to different values – which takes precedence. > - Does variables written in `$SPARK_HOME/conf/spark-env.sh` take precedence > over variable defined in the shell/script starting spark? > h2. Ideal Solution > What I am looking for is esentially a single reference, that > 1. defines the precedence of different ways of specifying variables for > spark and > 2. lists all variables/parameters. > For example something like this: > || Varialble || Cmd-line || Default || Description || > | SPARK_MASTER_PORT | -p --port | 8080 | Port for master to > listen on | > | SPARK_SLAVE_PORT | -p --port | random | Port for slave to > listen on | > | SPARK_WORKER_DIR | -d --dir | $SPARK_HOME/work | Used as default for > worker data | > | SPARK_LOCAL_DIRS | | $SPARK_WORKER_DIR| Scratch space for RDD's > | > | .... | .... | .... | .... | -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org