[ https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199672#comment-15199672 ]
Pat Ferrel commented on MAHOUT-1762: ------------------------------------ I agree with the reasoning for this but the drivers have a pass-through to Spark for arbitrary key=value pairs and switching to sparksubmit was voted down so it was never done. If you are using Mahout as a lib you can set anything in the SparkConf that you want so not sure what is remaining here but a more than reasonable complaint about how the launcher scripts are structured. > Pick up $SPARK_HOME/conf/spark-defaults.conf on startup > ------------------------------------------------------- > > Key: MAHOUT-1762 > URL: https://issues.apache.org/jira/browse/MAHOUT-1762 > Project: Mahout > Issue Type: Improvement > Components: spark > Reporter: Sergey Tryuber > Assignee: Pat Ferrel > Fix For: 1.0.0 > > > [spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties] > is aimed to contain global configuration for Spark cluster. For example, in > our HDP2.2 environment it contains: > {noformat} > spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0–2041 > spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0–2041 > {noformat} > and there are many other good things. Actually it is expected that when a > user starts Spark Shell, it will be working fine. Unfortunately this does not > happens with Mahout Spark Shell, because it ignores spark configuration and > user has to copy-past lots of options into _MAHOUT_OPTS_. > This happens because > [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala] > is executed directly in [initialization > script|https://github.com/apache/mahout/blob/master/bin/mahout]: > {code} > "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" > "org.apache.mahout.sparkbindings.shell.Main" $@ > {code} > In contrast, in Spark shell is indirectly invoked through spark-submit in > [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell] > script: > {code} > "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@" > {code} > [SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala] > contains an additional initialization layer for loading properties file (see > SparkSubmitArguments#mergeDefaultSparkProperties method). > So there are two possible solutions: > * use proper Spark-like initialization logic > * use thin envelope like it is in H2O Sparkling Water > ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)