[jira] [Resolved] (SPARK-5590) Create a complete reference of configurable environment variables, config files and command-line parameters

Sean Owen (JIRA) Tue, 16 Feb 2016 02:11:20 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-5590.
------------------------------
    Resolution: Won't Fix

> Create a complete reference of configurable environment variables, config 
> files and command-line parameters
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5590
>                 URL: https://issues.apache.org/jira/browse/SPARK-5590
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>         Environment: All
>            Reporter: Tobias Bertelsen
>
> This originated as [a question on a 
> stackoverflow|http://stackoverflow.com/q/28219279/]
> It will be great with a complete reference of the different ways of 
> configuring spark master and workers – especially different names of the same 
> parameter and the precedence of different ways of configuring the same thing.
> From the original stackoverflow question:
> h2. Known resources
>  - [The standalone 
> documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] is 
> the best I have found, but it does not clearly describes relationships 
> between different variables/parameters nor which take precedence over other.
>  - [The configuration 
> documentation|http://spark.apache.org/docs/1.2.0/configuration.html] provides 
> a good overview for application-properties, but not for the master/slave 
> launch-time parameters.
> h2. Example problem
> The [standalone 
> documentation|http://spark.apache.org/docs/1.2.0/spark-standalone.html] 
> writes the following:
> {quote}
>  the following configuration options can be passed to the master and worker
>  ...
>  `-d DIR, --work-dir DIR`     Directory to use for scratch space and job 
> output logs (default: SPARK_HOME/work); only on worker
> {quote}
> and later
> {quote}
>  `SPARK_LOCAL_DIRS` Directory to use for "scratch" space in Spark
>  `SPARK_WORKER_DIR` Directory to run applications in, which will include both 
> logs and scratch space (default: SPARK_HOME/work).
> {quote}
> As a spark-newbe I am a little confused by now. 
>  - What is the relationship between `SPARK_LOCAL_DIRS`, `SPARK_WORKER_DIR`, 
> and `-d`.  
>  - What if I specify them all to different values – which takes precedence.
>  - Does variables written in `$SPARK_HOME/conf/spark-env.sh` take precedence 
> over variable defined in the shell/script starting spark?
> h2. Ideal Solution
> What I am looking for is esentially a single reference, that
>  1. defines the precedence of different ways of specifying variables for 
> spark and
>  2. lists all variables/parameters.
> For example something like this:
> || Varialble         || Cmd-line  || Default          || Description ||
>  | SPARK_MASTER_PORT | -p --port | 8080             | Port for master to 
> listen on |
>  | SPARK_SLAVE_PORT  | -p --port | random           | Port for slave to 
> listen on |
>  | SPARK_WORKER_DIR  | -d --dir  | $SPARK_HOME/work | Used as default for 
> worker data  |
>  | SPARK_LOCAL_DIRS  |           | $SPARK_WORKER_DIR| Scratch space for RDD's 
> |
>  | ....              | ....      | ....             | .... |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5590) Create a complete reference of configurable environment variables, config files and command-line parameters

Reply via email to