I've run into this problem starting $ mahout shell-script.  i.e. needing to set 
the spark.kryoserializer.buffer.mb and  spark.akka.frameSize.  I've been 
temporarily hard coding them for now while developing.    

I'm just getting familiar with What you've done with the CLI drivers.  For #2 
could we borrow option parsing code/methods from spark [1] [2] at each (spark) 
release and somehow add this to MahoutOptionParser.parseSparkOptions?

I'll hopefully be doing some CLI work soon and have a better understanding.

[1]https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
  
[2]https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

> From: [email protected]
> Subject: Spark options
> Date: Wed, 5 Nov 2014 09:48:59 -0800
> To: [email protected]
> 
> Spark has a launch script as hadoop does. We use the Hadoop launcher script 
> but not the Spark one. When starting up your Spark cluster there is a 
> spark-env.sh script that can set a bunch of environment variables. In our own 
> mahoutSparkContext function, which takes the place of the Spark submit script 
> and launcher we don’t account for most of the environment variables.
> 
> Unless I missed something this means most of the documented options will be 
> ignored unless a user of Mahout parses and sets them in their own SparkConf. 
> The Mahout CLI drivers don’t do this for all possible options, only 
> supporting a few like job name and spark.executor.memory.
> 
> The question is how to best handle these Spark options. There seem to be two 
> options:
> 1) use sparks launch mechanism for drivers but allow some to be overridden in 
> the CLI
> 2) add parsing the env for options and set up the SparkConf default in 
> mahoutSparkContext with those variables. 
> 
> The downside of #2 is that as variables change we’ll have to reflect those in 
> our code. I forget why #1 is not an option but Dmitriy has been consistently 
> against this—in any case it would mean a fair bit of refactoring I believe.
> 
> Any opinions or corrections?

                                          

Reply via email to