[ 
https://issues.apache.org/jira/browse/MAHOUT-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896764#comment-15896764
 ] 

ASF GitHub Bot commented on MAHOUT-1762:
----------------------------------------

GitHub user rawkintrevo opened a pull request:

    https://github.com/apache/mahout/pull/292

    MAHOUT-1762 Utilize spark-submit in bin/mahout script

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rawkintrevo/mahout mahout-1762

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/mahout/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #292
    
----
commit c5451287b54d55ce586ecfbb340c9bb023385765
Author: rawkintrevo <trevor.d.gr...@gmail.com>
Date:   2017-03-06T05:31:42Z

    MAHOUT-1762 Utilize spark-submit in bin/mahout script

----


> Pick up $SPARK_HOME/conf/spark-defaults.conf on startup
> -------------------------------------------------------
>
>                 Key: MAHOUT-1762
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1762
>             Project: Mahout
>          Issue Type: Improvement
>          Components: spark
>            Reporter: Sergey Tryuber
>            Assignee: Trevor Grant
>             Fix For: 0.13.0
>
>
> [spark-defaults.conf|http://spark.apache.org/docs/latest/configuration.html#dynamically-loading-spark-properties]
>  is aimed to contain global configuration for Spark cluster. For example, in 
> our HDP2.2 environment it contains:
> {noformat}
> spark.driver.extraJavaOptions      -Dhdp.version=2.2.0.0–2041
> spark.yarn.am.extraJavaOptions     -Dhdp.version=2.2.0.0–2041
> {noformat}
> and there are many other good things. Actually it is expected that when a 
> user starts Spark Shell, it will be working fine. Unfortunately this does not 
> happens with Mahout Spark Shell, because it ignores spark configuration and 
> user has to copy-past lots of options into _MAHOUT_OPTS_.
> This happens because 
> [org.apache.mahout.sparkbindings.shell.Main|https://github.com/apache/mahout/blob/master/spark-shell/src/main/scala/org/apache/mahout/sparkbindings/shell/Main.scala]
>  is executed directly in [initialization 
> script|https://github.com/apache/mahout/blob/master/bin/mahout]:
> {code}
> "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" 
> "org.apache.mahout.sparkbindings.shell.Main" $@
> {code}
> In contrast, in Spark shell is indirectly invoked through spark-submit in 
> [spark-shell|https://github.com/apache/spark/blob/master/bin/spark-shell] 
> script:
> {code}
> "$FWDIR"/bin/spark-submit --class org.apache.spark.repl.Main "$@"
> {code}
> [SparkSubmit|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala]
>  contains an additional initialization layer for loading properties file (see 
> SparkSubmitArguments#mergeDefaultSparkProperties method).
> So there are two possible solutions:
> * use proper Spark-like initialization logic
> * use thin envelope like it is in H2O Sparkling Water 
> ([sparkling-shell|https://github.com/h2oai/sparkling-water/blob/master/bin/sparkling-shell])



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to