Re: Spark options

Dmitriy Lyubimov Tue, 11 Nov 2014 14:20:13 -0800

these files if i read it correctly are for spawning yet another process. i
don't see how it may work for the shell.


I am also not convinced that spark-env is important for the client.


On Tue, Nov 11, 2014 at 2:09 PM, Pat Ferrel <[email protected]> wrote:

> I was thinking -Dx=y too, seems like a good idea.
>
> But we should also support setting them the way Spark documents in
> spark-env.sh and the two links Andrew found may solve that in a
> maintainable way. Maybe we get the SparkConf from a new mahoutSparkConf
> function, which handles all env supplied setup. For the drivers it can be
> done in the base class allowing and CLI overrides later. Then the SparkConf
> is finally passed in to mahoutSparkContext where as little as possible is
> changed in the conf.
>
> I’ll look at this for the drivers. Should be easy to add to the shell.
>
> On Nov 11, 2014, at 12:36 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
> IMO you just need to modify `mahout spark-shell` to propagate -Dx=y
> parameters to the java startup call and all should be fine.
>
> On Tue, Nov 11, 2014 at 12:23 PM, Andrew Palumbo <[email protected]>
> wrote:
>
> >
> >
> >
> > I've run into this problem starting $ mahout shell-script.  i.e. needing
> > to set the spark.kryoserializer.buffer.mb and  spark.akka.frameSize.
> I've
> > been temporarily hard coding them for now while developing.
> >
> > I'm just getting familiar with What you've done with the CLI drivers.
> For
> > #2 could we borrow option parsing code/methods from spark [1] [2] at each
> > (spark) release and somehow add this to
> > MahoutOptionParser.parseSparkOptions?
> >
> > I'll hopefully be doing some CLI work soon and have a better
> understanding.
> >
> > [1]
> >
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
> > [2]
> >
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
> >
> >> From: [email protected]
> >> Subject: Spark options
> >> Date: Wed, 5 Nov 2014 09:48:59 -0800
> >> To: [email protected]
> >>
> >> Spark has a launch script as hadoop does. We use the Hadoop launcher
> > script but not the Spark one. When starting up your Spark cluster there
> is
> > a spark-env.sh script that can set a bunch of environment variables. In
> our
> > own mahoutSparkContext function, which takes the place of the Spark
> submit
> > script and launcher we don’t account for most of the environment
> variables.
> >>
> >> Unless I missed something this means most of the documented options will
> > be ignored unless a user of Mahout parses and sets them in their own
> > SparkConf. The Mahout CLI drivers don’t do this for all possible options,
> > only supporting a few like job name and spark.executor.memory.
> >>
> >> The question is how to best handle these Spark options. There seem to be
> > two options:
> >> 1) use sparks launch mechanism for drivers but allow some to be
> > overridden in the CLI
> >> 2) add parsing the env for options and set up the SparkConf default in
> > mahoutSparkContext with those variables.
> >>
> >> The downside of #2 is that as variables change we’ll have to reflect
> > those in our code. I forget why #1 is not an option but Dmitriy has been
> > consistently against this—in any case it would mean a fair bit of
> > refactoring I believe.
> >>
> >> Any opinions or corrections?
> >
> >
> >
>
>

Re: Spark options

Reply via email to