IMO you just need to modify `mahout spark-shell` to propagate -Dx=y
parameters to the java startup call and all should be fine.

On Tue, Nov 11, 2014 at 12:23 PM, Andrew Palumbo <[email protected]> wrote:

>
>
>
> I've run into this problem starting $ mahout shell-script.  i.e. needing
> to set the spark.kryoserializer.buffer.mb and  spark.akka.frameSize.  I've
> been temporarily hard coding them for now while developing.
>
> I'm just getting familiar with What you've done with the CLI drivers.  For
> #2 could we borrow option parsing code/methods from spark [1] [2] at each
> (spark) release and somehow add this to
> MahoutOptionParser.parseSparkOptions?
>
> I'll hopefully be doing some CLI work soon and have a better understanding.
>
> [1]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
> [2]
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
>
> > From: [email protected]
> > Subject: Spark options
> > Date: Wed, 5 Nov 2014 09:48:59 -0800
> > To: [email protected]
> >
> > Spark has a launch script as hadoop does. We use the Hadoop launcher
> script but not the Spark one. When starting up your Spark cluster there is
> a spark-env.sh script that can set a bunch of environment variables. In our
> own mahoutSparkContext function, which takes the place of the Spark submit
> script and launcher we don’t account for most of the environment variables.
> >
> > Unless I missed something this means most of the documented options will
> be ignored unless a user of Mahout parses and sets them in their own
> SparkConf. The Mahout CLI drivers don’t do this for all possible options,
> only supporting a few like job name and spark.executor.memory.
> >
> > The question is how to best handle these Spark options. There seem to be
> two options:
> > 1) use sparks launch mechanism for drivers but allow some to be
> overridden in the CLI
> > 2) add parsing the env for options and set up the SparkConf default in
> mahoutSparkContext with those variables.
> >
> > The downside of #2 is that as variables change we’ll have to reflect
> those in our code. I forget why #1 is not an option but Dmitriy has been
> consistently against this—in any case it would mean a fair bit of
> refactoring I believe.
> >
> > Any opinions or corrections?
>
>
>

Reply via email to