[ 
https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pat Ferrel resolved MAHOUT-1762.
--------------------------------
    Resolution: Won't Fix

We don't know of anything this blocks and moving to using sparksubmit was voted 
down, which only applies to Mahout CLI drivers anyway. All CLI drivers support 
passthrough of arbitrary key=value pairs, which go into the SparkConf and when 
using Mahout as a Lib you can create any arbitrary SparkConf.

Will not fix unless someone can explain the need. 

> Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
> -------------------------------------------------------
>
>                 Key: MAHOUT-1762
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1762
>             Project: Mahout
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Sergey Tryuber
>            Assignee: Pat Ferrel
>             Fix For: 1.0.0
>
>
> [spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties]
>  is aimed to contain global configuration for Spark cluster. For example, in 
> our HDP2.2 environment it contains:
> {noformat}
> spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
> spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
> {noformat}
> and there are many other good things. Actually it is expected that when a 
> user starts Spark Shell, it will be working fine. Unfortunately this does not 
> happens with Mahout Spark Shell, because it ignores spark configuration and 
> user has to copy-past lots of options into _MAHOUT_OPTS_.
> This happens because 
> [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala]
>  is executed directly in [initialization 
> script|https://github.com/apache/mahout/blob/master/bin/mahout]:
> {code}
> "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" 
> "org.apache.mahout.sparkbindings.shell.Main" $@
> {code}
> In contrast, in Spark shell is indirectly invoked through spark-submit in 
> [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell] 
> script:
> {code}
> "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
> {code}
> [SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
>  contains an additional initialization layer for loading properties file (see 
> SparkSubmitArguments#mergeDefaultSparkProperties method).
> So there are two possible solutions:
> * use proper Spark-like initialization logic
> * use thin envelope like it is in H2O Sparkling Water 
> ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to