spark-defaults.conf on startup

Dmitriy Lyubimov Mon, 10 Aug 2015 09:56:53 -0700

This is very reasonable. Please feel free to do a PR to this Jira. Please
 put MAHOUT-1762 in the description of the PR. It is quite ok to PR a WIP.


On Mon, Aug 10, 2015 at 8:18 AM, Sergey Tryuber (JIRA) <[email protected]>
wrote:

> Sergey Tryuber created MAHOUT-1762:
> --------------------------------------
>
>              Summary: Pick up $SPARK_HOME/conf/spark-defaults.conf on
> startup
>                  Key: MAHOUT-1762
>                  URL: https://issues.apache.org/jira/browse/MAHOUT-1762
>              Project: Mahout
>           Issue Type: Wish
>           Components: spark
>             Reporter: Sergey Tryuber
>
>
> [spark-defaults.conf|
> http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties]
> is aimed to contain global configuration for Spark cluster. For example, in
> our HDP2.2 environment it contains:
> {noformat}
> spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
> spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
> {noformat}
> and there are many other good things. Actually it is expected that when a
> user starts Spark Shell, it will be working fine. Unfortunately this does
> not happens with Mahout Spark Shell, because it ignores spark configuration
> and user has to copy-past lots of options into _MAHOUT_OPTS_.
>
> This happens because [org.apache.mahout.sparkbindings.shell.Main|
> https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala]
> is executed directly in [initialization script|
> https://github.com/apache/mahout/blob/master/bin/mahout]:
> {code}
> "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH"
> "org.apache.mahout.sparkbindings.shell.Main" $@
> {code}
> In contrast, in Spark shell is indirectly invoked through spark-submit in
> [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell]
> script:
> {code}
> "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
> {code}
> [SparkSubmit|
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
> contains an additional initialization layer for loading properties file
> (see SparkSubmitArguments#mergeDefaultSparkProperties method).
>
> So there are two possible solutions:
> * use proper Spark-like initialization logic
> * use thin envelope like it is in H2O Sparkling Water ([sparkling-shell|
> https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Re: [jira] [Created] (MAHOUT-1762) Pick up $SPARK_HOME/conf/spark-defaults.conf on startup

Reply via email to